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PREFACE 


an 


science of statistics in this country seems to have been subject 
0 main influences. The biometric school associated with Pro- 
r Karl Pearson have developed methods and concepts which 
the basis of the whole subject, and in more recent years the needs 
iological experimentalists^haye been met by developments in 
thel largely to' Dr.'R. A. Fisher. There are many text- 

I^Q,|)ks on what may be regarded as the classical theory of statistics, 
Fisher’s own methods are, described in his book, Statistical 
thods for Research Workers, In the present book I have attempted 
J present a single system of statistics, so that a reader with little 
levious acquaintance may obtain a good working knowledge and 
derstanding of tlie methods available. ‘ 

The first chapters deal with frequency distributions and constants, 
d with the theory of errors, in orthodox manner, but in the later 
^chapters the underlying theme is Fisher’s ijdea of the Analysis of 
Variance; correlation is introduced as a special case of this. There 
are, of course, other ways of regarding the’ subject, but its unity 
seems to me to be brought out more by this than by any other 
method of presentation. 

In outlining the theory underlying the methods described, I have 
given mathematical proofs where they are easy; but where they are 
not, I have not hesitated merely to state the results. The necessary 
mathematical attainments of readers are slight and include algebra^ 
up to the binomial theorem and the elementary notions of the 
calculus; even those who have to omit the malhematics entirely 
should be able to arrive at an appreciation of the arguments. Never¬ 
theless, the theory of statistics is not easy, not so much because it is 
abstruse, as because the ideas are new to most people, and a good 
deal of hard thinking and patient work will be necessary. 

I have drawn largely on the literature for examples, and it will be 
obvious to biologists that I am not one, for some of the statistical 
deductions I make are of no biological interest, although the methods 
applied to other inquiries may be of great value. Statistics is a 
tool, and the good craftsman knows what he wants to make with 
his tool before he lifts it; it is only the small boy who uses it promis¬ 
cuously. In this book I have often been like the small boy, blit if 
readers will be small boys with me, they will learn how to handle 
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s 

the tool, and then will be able to apply it with biological judgi 
There is no better way of learning statistics than by working thr 
examples. i 

I wish to thank Mr. F. D. Farrow, Mr. H. A. Hancock | 
Miss R. Hodgson for their help and criticisms, and Mr. Ljl 
Galloway and Mr. J, Gregory for providing me with unpublisi 
material for examples. ( 


August, igsi 


L. H. C. TIPP 




I 


PREFACE TO SECOND EDITION 

1 

For this edition I have revised the book considerably, and in par¬ 
ticular have re-written the first three chapters. There is now more 
mathematical theory tlian before, and although I have not tried to 
prove everything, I have aimed at giving typical proofs to show in 
what way the various distributions and methods arise. These proofs^ 
are in separate sub-sections, and may be omitted if desired. There 
are other additions that call for no special comment. 

Several topics that were dealt with in the first edition are now 
excluded. Tests of significance based on the correlation ratio are 
omitted because they are less convenient than the equivalent a'-tests, 
and the intra-class correlation coefiicient because it is now only 
of historical interest. I have not included the theory of ranking 
because (i) it is not often needed, (2) when required it is usually 
for application to small samples whereas the theory given before 
was for large samples, and (3) I am not aware of any convenient 
methods involving ranking that are as well founded as the other 
methods given in this book. 

I wish to acknowledge with thanks the criticisms, suggestions and 
corrections received from many friends and correspondents, and am 
particularly indebted to Dr. A. J. Turner for reading and comment¬ 
ing on the manuscript of this revised edition. 

L. H. C. T, 

June, igs? 






PREFACE TO THIRD EDITION 

ThIlY year I become aware of new directions in which I think this 
to ^ could be improved, and developments in the subject of statistics 
fesr' make periodical revisions desirable. The changes I would wish 
fori^take at this stage, however, are not sufficiently important to 
of any considerable amount of resetting of the type, particularly 
theihere’s a war on,” and so this edition is the same as the second 
b(|;^pt for a few minor corrections—^and one important addition, 
anclie addition is a set of tables, given at the end of the book, of 
M normal probability integral, i 2,nd tables based on Fisher’s z 
to testing the significance of differences between variances. These 
p^les are a shortened and, in some instances, slightly modified form 
hose in R. A. Fisher’s Statistical Methods for Research Workers, 
^d are sufficient for many of the experimenter’s everyday require¬ 
ments. I acknowledge with pleasure the generosity of Professor 
isher and his publishers, Messrs. Oliver & Boyd, in allowing me 
3 reproduce these tables. 

L. H. C. T. 

idarch, 1941 
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contains many individuals; indeed for the latter it is the most fre¬ 
quent, showing that most of the 266 species are found only in fewer 
than eight vice-counties. This is the form of the “hollow curve’’ 
mentioned in Willis’s book, Age and Area, and similar distributions 
are given by the number of petals of some flowers. The distribution 
of degrees of cloudiness is of a very unusual form; it shows that 
for July the sky is usually either nearly completely overcast or 
completely clear, and that comparatively seldom is there a moderate 
amount of cloudiness. Sometimes there are two or more humps or 
I modes as they are called, as in the rays of chrysanthemums, and such 
a distribution usually indicates some mixture of types in the data. 
Care must be taken, of course, not to mistake some small irregu¬ 
larity, such as often occurs, for a second mode. 

Formation of Frequency Tables 

1 . 11 . The determination of the appropriate size of sub-group of a 
frequency table or the number of classes into which the total range 
is divided is a compromise between giving too much and too little 
detail. If the variate is continuous and the sample is large, it has 
been found convenient to have between ten and twenty classes, but 
if the number of readings is less than 100 fewer groups show up 
better the essential features of the distribution. Care should be 
taken not to make the grouping too fine relative to the unit of 
measurement of the character. In dealing with the age returns of. 
men, for instance, it is usually inadvisable to choose sub-ranges 
of less than a year, for many people give their age last birthday, so 
that the groups representing integral years contain an undu6 number 
of observations, while those representing fractions have compara¬ 
tively few. If the variate is discrete, the natural unit of grouping 
is the unit of variation, as in Table i.i, (ii) and (vi), unless there 
are too many for the size of the sample and the distribution is 
irregular; in such instances, several imits may be combined to form 
one group as in the fourth example of Table i.i, where the 71 vice¬ 
counties have been divided into 10 groups. 

If the variate is continuous, the sub-ranges must also be con¬ 
tinuous ; there must be no gap between the highest value in one 
and the lowest in the next. If this is borne in mind when making 
the measurements, there need be little difficulty in forming the 
distribution, but sometimes the data are collected without reference 
to the fact that they may have to be grouped, and then care is 
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necessary. A common way of recording data is correct to the nearest 
unit. Suppose, for example, the data in Table 1.3 are of a con¬ 
tinuous variate and are correct to the nearest unit, then the system 
of classification into groups of three units is unsatisfactory in that 
there is a gap between the sub-ranges; the first nominally stops 
at 29 and the second starts at 30 units—where does an individual 
with an actual value of 29*7 (say) go ? The readings recorded 
as 29 units may be anywhere between 28 * 5 and 29 • 5 units, so the 
sub-range which nominally extends from 27 to 29 units actually 
extends from 26*5 to 29-5; the nominal sub-range 30-32 actually 
extends from 29*5 to 32*5 units, and so on. The actual sub-ranges 

join up. 

Qualitative data may, of course, be classified and formed into a 
frequency distribution in which the sub-groups are the qualities de¬ 
scribed ; in dealing with hair colour, for instance, red, fair, light brown, 
dark brown, and jet black may be taken as the separate classes. 

Frequency Constants 

1.20 Although frequency diagrams are useful as giving a visual 
impression of the characteristics of a sample, the evidence of the 
eye/alone is not enough, and some numerical measure of the important 
characteristics should be used for more exact description and com¬ 
parison. The determination of such a measure carries the process 
of condensing the original data a stage further than the formation 
of a frequency distribution, and this can only be done by ignoring 
further features as being irrelevant to the particular investigation. 
Sometimes a special constant may be designed for some particular 
purpose; for example in the kinetic theory of gases, the mean square 
velocity is a sufScient description of the distribution of velocities 
of the molecules for the purpose of establishing energy relationships. 
If the frequency diagram is of unusual shape, e.g. multi-modal, 
special methods of expression may be necessary, but for most 
purposes and for distributions commonly met with, there are 
standard frequency constants that have been devised to express 
various characteristics. These constants will be described here. It 
may be emphasised at the outset, however, that in using constants 
the investigator should always keep the practical problem in view 
and choose methods that are appropriate. There is no virtue in 
calculating statistical constants in a blind, routine marmef, without 
knowing first that they will be of some use. 
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Proportionate Frequencies 

1 . 21 . For many purposes, the investigator is only interested in the 
proportion of individuals having characters below or above a certain 
value, or between two. It can well be imagined that the U.S. 
recruiting official would be interested in knowing the proportion 
of men below (say) 65 inches in height, so that he would know to 
what extent the number of recruits would fall off if he were to make 
that the minimum height allowable, and from Table i.i (i) we see 
that the ratio is 6 825 :25 878, or about 26*4 per cent. 

It usually (but not always) happens that the size of the sample 
is governed by some arbitrary conditions that have no objective 
significance, so that the distribution is expressed in a more general 
form if the frequencies are given as fractions or percentages of the 
total. 

When a distribution is represented diagrammatically by a histo¬ 
gram, the proportionate frequency between two limits is the area 
under the diagram between two ordinates drawn at those liimts. 

Frequencies and proportionate frequencies xmderlie nearly all 
methods of statistical representation. Whatever constants may be 
calculated or however elaborate may be the analysis, the final inter¬ 
pretation is in terms of frequencies. If the data are sufficiently 
numerous and the problem is simple enough for an answer in terms 
of frequencies to be obtained directly, there is no need to compute 
further constants. 

Position 

1 . 22 . The location or position of a frequency distribution is some 
single measure describing a value of the variate about which the 
observations are scattered. For example, it may be the bull’s-eye 
aimed at by our marksman. There are several constants for describing 

this. 

\ The most common constant is the mean or averagey whicdi is the 
sum of all the observations divided by the number.* It is well 
known and understood, but while admitting the great usefulness of 
the mean—it is perhaps the most useful constant of all—we would 
remind readers that it only measures one characteristic of a distribu¬ 
tion, and that there are oAers of great importance. 

• This is the arithmetic mean; the geometric and harmonic means are 
not so much used in statistics. 
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tAn alternative measure of position is the median. If all the 
observations in a sample are ranked in order of value, the median 
is the value of the middle one if the total number is odd, or a value 
between the two middle ones if the total number is even. The 
number of individuals greater than the median value is equal to 
the number that are smaller, and an ordinate drawn at that value 
divides a histogram into two equal areas. The median is troublesome 
to find if the sample is large and the grouping broad, for all the 
individuals in the group containing the median have to be ranked in 
order. In the third example of Table i.i there are i 117 observations, 
so the median is the value of the 559th placed in order, and lies 
somewhere between 16-5 and 20-5 scale divisions. In that group 
of 94 individuals the median is the 6th in order of magnitude, and 
we may take it as being approximately 16-5 +9% X 4 = 16'76 
divisions. 

The mode is that value of the variate about which the observations 
are most concentrated, that is the value at which the ordinate of the 
frequency diagram is highest, and in Table i.i (i), for instance, it is 
between 66 and 67 inches. It is not always easy to define accurately 
in a sample, for it may be anywhere in the most frequent group, 
or if the distribution is very flat at the top, and irregular, it is not 
even easy to decide which would probably be the modal group in 
the whole population; moreover, a distribution may have more than 
one real mode, as has already been illustrated. Usually, however, if 
there is a single mode, the position can be found by assuming 
Pearson s system of frequency curves as shown in section i .24. 

When iht distribution is symmetrical, the mean, median and mode 
always coincide, and in all other single-modal cases the median comes 
between the mean and the mode, the dispersion between these three 
constants being a measure of the extent of asymmetry of the distri¬ 
bution. For practical purposes, the following formula holds approxi¬ 
mately if the asymmetry is only moderate: 

Mean — Mode = 3 (Mean — Median)^ 

Of these constants, the mean is the most fundamental from a 
theoretical point of view and is the only one that can be used in 
further analysis of the data. For the mere representation of the 
COTtral tendency of a distribution, the median is sometimes recom¬ 
mended because it is said to be least affected by extreme individuals, 
but for most symmetrical uni-modal distributions, the mean is the 
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most stable constant and is least affected by idiosyncrasies of the 
particular sample. This result is derived from the tiieory of errors. 
When the variate is the life of the individuals under some test of 
endurance, e.g. the life of electric lamps when burnt at a standard 
voltage, the median may be an economical constant to determine; 
for when half the individuals have failed, the median value may be 
determined without any further testing. Since the modal is the most 
tj^ical value, it may be the most suitable single constant to use 
when the distribution is very skew. 

VariahiUty or Dispersion 

1 . 23 . There are several constants devised to measure the degree of 
variation or dispersion of the variate, which, in our analogy of 
section i.i, is the quality that distinguishes the marksmen. The 
most fundamental of these are the second moment^ or variance^ and 
its square root, called the standard deviation. The second moment is 
the mean squared deviation from the mean, and may be written 

S{x - xf 

where is the second moment, x the successive values of the variate 
in the sample, x the mean, N the number of individuals, and S the 
summation for all values of x. The standard deviation is 



and will usually be denoted by the symbol ^ or cr. When the standard 
deviation is expressed as a percentage of the mean, it is called the 
co^cient of variation. The computation of some of these constants 
will be illustrated at the end of the chapter. 

Another measure, the mean deviation, is often used. It is the sum of 
the deviations from the mean, irrespective of sign, divided by the 
number of observations. For frequency distributions of the ‘‘normal” 
form (which will be described in section 2.5), the standard deviation 
is I -253 times the mean deviation. This is the basis of Peter’s method 
of estimating the standard deviation. 

The quartile deviation is the deviation measured on either side of 
the mean, so chosen that as many observations lie beyond the quartile 
value as lie between it and the median, and ordinates drawn at the 
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median and at the two quartiles divide the area under a histogram 
into four equal parts. If the distribution is asymmetrical, the upper 
and lower quartile deviations are not equal. The average of the two 
is half the difference between the two quartile values and is called 
the semi-interquartile distance. In Table i.i (iii), if all the observa¬ 
tions were placed in order, we should have: 


279 obs. 


first 

quartile 


279 obs. 


median 


279 obs. 


third 

quartile 


279 obs. 


The first quartile lies between 8*5 and 10*5 scale divisions, and 
the third between 28*5 and 32-5 divisions, and the quartile devia¬ 
tions are the distances of these values from the mean. Historically, 


TABLE 1.2 


Number in Sample 


Mean Range 
Standard Deviation 


z 

I • 128 

5 

2-326 

10 

3-078 

50 

4-498 

100 

5-015 

500 

6-073 


this was one of the earliest measures of dispersion used, but there 
is no reason why the deviation of the ordinates dividing the frequency 
digram in any other proportions should not be used and, indeed, 
K. Pearson has shown (1920) that the deviation of the ordinate which 
cuts off i^th of the area determines the dispersion most accurately 
for the ‘‘normal” distribution. 

At first sight the range, which is the difference between the greatest 
and smallest members of a sample, would appear to be the most 
natural index of dispersion; but it unfortunately varies for samples 
of different sizes taken from the same population, being smaller 
for the smaller samples. When the population form and sample 
size are constant, however, the mean range is proportional to the 
standard deviation, and if their ratio is known the latter can be 
found from the former. For the “normal” form of distribution, 
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full tables are given in Biometfika, XVII (1925), p. 386, of the 
ratio 

Mean Range 
Standard Demotion 

for samples of between 2 and i 000 individuals, and an abstract of 
them is given in Table 1.2 \ from this we may see how much the range 
depends on the size of the sample. Of course, single values of the 
range are liable to rather large chance errors, but if it is convenient 


TABLE 1.3 



54 

55 

41 

70 

42 

50 

53 

31 

49 

47 


49 

59 

51 

56 

40 

52 

52 

5b 

41 

28 


50 

49 

57 

57 

44 

53 

72 

5b 

53 

60 


53 

54 

32 

47 

41 

43 

56 

59 

53 

47 


70 

62 

56 

44 

54 

49 

50 

35 

53 

38 

Means . 

55*2 

55-8 

47-4 

54-8 

44*2 

49*4 

56-6 

47*4 

00 

• 

0^ 

44*0 

Ranges . 

21 

13 

25 

26 

14 

10 

22 

38 

12 

32 


54 

27 

39 

52 

49 

57 

' 28 

48 

3b 

48 


27 

46 

58 

65 

49 

38 

51 

42 

48 

47 


48 

60 

66 

60 

44 

55 

49 

46 

55 

42 


41 

62 

59 

41 

49 

56 

50 

62 

53 

69 


47 

48 

63 

62 

64 

47 

53 

59 

47 

4b 

Means . 

43-4 

48*6 

S7-0 

56*0 

Si’O 

50*6 

46-2 

51*4 

47-8 

50-4 

Ranges . 

27 

35 

27 

24 

20 

19 

25 

20 

19 

27 


to collect the data in groups of 5 or 10 (say), the mean range can be 
found quite quickly, and when divided by the appropriate constant 
obtained from the tables, can be converted to the standard deviation. 
The ratio does not seem to be very sensitive to moderate changes 
in the form of the distribution, and this method may be used in most 
practical experience when the frequency distribution is uni-modal 
and tails off fairly gradually to zero at the extremes. Because of its 
ease of computation, the range is convenient in routine work. In 
Table 1.3 there are 100 individuals from an artificially constructed 
“normal’^ population, and these are arranged in random groups of 5. 
The ranges are given, and their mean is 22 • 3. The constant given in 
Table 1.2 for samples of 5 is 2-326, whence we obtain for an estimate 
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of the standard deviation a value 22*3/^*326 = 9*6; this is not far 
from the true standard deviation, which happens to be 10 units. 

This method of obtaining the standard deviation is only equivalent 
to the other method when the individuals are mixed and divided 
into sub-samples at random. 

All these measures of dispersion may be rather bewildering to the 
reader, but they are simply so many alternatives, which are exactly 
related for any given form of frequency distribution. In view of the 
theory of the next chapter and section 3.7, it will be well to regard 
the standard deviation as the fundamental measure of scatter, and 
the other constants as approximations to it. It is sometimes thought 
that these measures of variability apply only to distributions of the 
“normar’ type, but this is not so; they are equally applicable to 
skew distributions, and, provided the shape is the same, may be 
used to compare one distribution with another. 

A qualitative appreciation of dispersion or variation may be 
acquired with little experience, and for comparative purposes any 
of the above measures may be appreciated fairly easily; the more 
precise interpretation of the standard deviation in terms of frequencies 
is dealt with in section 2.52. 


Shape 

1 . 24 . The shape of a uni-modal frequency distribution may vary 
in two ways, in the degree of asymmetry, or in the flatness of the 
mode. This flatness of the mode (or kurtosis) is different from that 
flatness of the curve as a whole which arises from the dispersion, 
and is illustrated in Fig. 2, where there are several curves having 
the same standard deviation but varying kurtosis. These properties 
may be measured by constants derived from the third and fourth 
moments of the distribution. The third moment, 


^3 = 


S{x — a:)® 

N 


and the fourth moment, 





where the symbols on the right-hand side of the equations have the 
same meaning as those used in defining the second moment. K. Pear¬ 
son has derived two constants from the moments, which are inde- 
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pendent of the dispersion of the distribution, and describe its shape. 

mi 

rhey are 






j 3 i is zero if the distribution is symmetrical, while jSg is usually about 
3 for a curve like'that of Fig. i.i (i), is smaller if the mode is flatter, 
and larger if the curve is more sharply peaked. In Fig. z are given 
a few smooth frequency curves with different values of and jSg. 
They all have positive skewness, and a similar series with negative 
skewness can be imagined. 

The degree of skewness may be more simply measured by the 
quantity 


Skewness = 


Mean — Mode 
Standard Deviation 


This quantity is negative if the curve shows a bias to the right 
(assuming that ascending values of the variate are taken from left 
to right) and has the longer tail to the left. The position of the 
mode, as we have shown, is not easy to define, but if the curve comes 
within Pearson's* system, the following formula may be used: 


Skewness = 


Vmz + 3 ) 

2 ( 5^2 — 6^1 — 9 ) 


in which 'V j 3 i is given the same sign as ju.3. From this we obtain the 
equation for fixing the mode of a distribution relative to its mean, 


Mean — Mode — 


+ 3) 

2(5^2 — 6^1 - 9) 


These constants of shape are rather difficult to interpret, and it is 
only after considerable experience that their practical import can 
be appreciated. 


Computation of Moments 
Moments from Ungrouped Data 

1 , 31 . It is always possible to compute the mean and the higher 
moments in a sample by straightforward evaluation of the expres¬ 
sions given in sections i.22-1.24 as definitions. For example, the 

^ This will be mentioned in section 2.6. 
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mean and second moment are calculated directly in the first three 
columns of Table 1.4 for a small sample of ten individuals. We have 





not yet dealt with the theory of methods for such small samples,' 
and this one is given here merely to illustrate arithmetical processes 
that are applicable to small and large samples. The sums of the 
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columns are given at the foot of the table, and from them it is seen 
that the mean, == 71-8 and the second moment ju-g = ia'96. 

Such calculations are much facilitated by making a transformation 
of the values of the variate, measuring them as deviations from 
some arbitrary origin or ‘ Vorking mean” and dividing the deviations 
by an arbitrary constant; this last step is equivalent to representing 


TABLE 1.4 


X 

(x-X) 


{X — 5)2 

X' 

Xf2 

67 

- 4-8 


23*04 

— I 

I 

70 

- 1*8 


3-24 

0 

0 

76 

+ 4‘2 


17-64 

“I” 2 

4 

73 

+ 1*2 


1*44 

+ I 

I 

67 

— 4*8 


23*04 

— I 

I 

73 

+ 1*2 


1*44 

-j- I 

I 

70 

— 1*8 


3*24 

0 

0 

73 

+ 1*2 


1*44 

+ I 

I 

70 

- 1*8 


3*24 

0 

0 

79 

>1 

-j- ' 7*2 


51-84 

+ 3 

9 

718 

0*0 


129 * 60 

+ 6 

18 


the variation on an arbitrary scale. If is the transformed variate, 
X the arbitrary origin, and h the arbitrary constant or scale, 

= and x = X+hx' . . . (i.i) 


Further, let the mean of the values of x' be x\ and the mean of 
their sth power be jlc', so that 



* The letter fi means that the constant is the mean of some power of 
deviations, the subscript 5 denotes the order of the power (e,g. for the 
second moment, 5 = 2) and the prime indicates that the deviations are from 
an arbitrary origin in arbitrary units. Consistent with this notation x' *= fjL[. 
Where there is no prime, the deviations are measured from the mean in the 
units of the variate, and the /Xs is the 5th moment. 
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Then it may be shown that: 

X = X hx\ 

— {j.[% , 

7Q/ f f f \ ^ * * V^***’/ 

H = « (^*3 — + 2 ^ 1 ). 

F4 = 

To illustrate the use of these equations the mean and second moment 
have been calculated for the data of Table 1.4. The arbitrary origin 
X == 70, the constant A = 3, and the values of x' are given in the 
fourth column. From their sum, == o*6 and 

je = 70 + 3 X o • 6 == 71 • 8. 


The sum of the values of given in the fifth column, gives 
/Xg = 1-8, and from equation (1.2), 

^2 = 9(1 *8 — 0*36) = 12*96. 


These results agree with those calculated directly, as indeed they 
should. Readers are recommended to calculate the third and fourth 
moments for Table 1.4 by the two methods. 

1 . 311 . The proof of equations (1.2) depends on applying the 
ordinary rules of addition to summations denoted by the sign 
The first is that the summation of a compound term itself made 
up of the sum of several terms is equal to the sum of the summations 
of the separate terms. For example, 

S{X + hx^) = SX+ Shx\ 

The second rule is that the summation of a term that is 
multiplied by a factor constant for all values of the term summed, 
is equal to that constant multiplied by the summation of the term. 
For example, 

Shx' = hSx'. 

The general significance of these rules should be appreciated. 

From equation (i.i) and the above examples of the summation 

* Readers will do well to master these rules for the sake of following later 
chapters. 
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rules the first part of equation (1.2) follows easily. It also follows 
that 


and 


X — X = h(x' — x')y 


_ S{x - x)^ _ Sh%x' - xj 
N N 


Taking the constant term outside the S sign and expanding the 
term in brackets by the binomial theorem, we find that 

V + + ... 

(— + (— 

Summing this term by term, and remembering that x' = /Xp and 
is constant for all individuals in the sample, we combine the last 
two terms, and find that 

Fs = 

If 5 is successively put equal to 2, 3 and 4, the last three of equa¬ 
tions (1.2) result. 


Moments from Grouped Data 

1 . 32 . When calculating the moments of a frequency distribution, 
it is usual to assume that all the individuals in any one group have 
the central value of that group, i.e. the value midway between the 
limits of the sub-range. For Table i.i (i) the central values are 
51^'5) 52*5, etc., inches. These values may be transformed as shown 
in section 1.31. Then to calculate the moments, instead of summing 
over all individuals, it is possible to multiply the appropriate power 
of the transformed group value by the frequency and to sum these 
products over all groups. Let xi be the fth group value (the subscript 
denotes that the transformed variate can only take discrete values 
corresponding to the central values of the groups), the frequency 
in that group, and S be the summation over all groups, then for 
such data, ^ 

Snt = N and . . . (1,3) 



FREQUENCY DISTRIBUTIONS AND CONSTANTS 39 

e 

The letter v' is used for the mean of a power of a transformed 
variate calculated from grouped data where p' would be used if the 
data were ungrouped. 

For example, the data in Table 1.4 may be grouped and the sum 
of the fourth column is 

3X(“-i) + 3Xo + 3Xi + iX2+iX3. 

Using equations (1.3) and applying equations (1,2) to the resulting 
means as shown above, it is possible to compute the crude mean 
and moments from grouped data. These moments are denoted by 
Vg, Vq and V4. For a continuous variate some of the crude moments v 
differ slightly from the true moments denoted by even when 
calculated from very large (infinite) samples, because the discon¬ 
tinuous form of a frequency distribution with finite groups is only 
an approximation to the continuous form that is possible with a 
continuous variate. Sheppard’s corrections, when applied to such 
crude moments calculated from a table with uniform sub-ranges, 
give values that approximate more closely to the true moments. 
The corrected mean and moments so obtained are: 

X = crude mean^ 

fiz = v^ — 

P'3 ~ ^ 3 > 

and == ^^4 — ¥2^^ + 

The above transformations make the computation of moments 
from grouped data a comparatively easy matter if the arbitrary 
origin X is chosen to be the central vdue of one of the groups 
near the centre of the whole distribution and h is made equal to 
the sub-ranges, so that the values of the transformed variate xl are 
o, I, 2, 3 ... — I, — 2, — 3 ... etc. The corrections (1.4) can 
then be applied to the arbitrary moments, putting h equal to i, and 
the final true moments be found by multiplying by A, h? and h^\ 
the ^ coefficients, being ratios, are found directly. 

The whole process is followed in the example of Table 1.5, the 
data being the heights of i 078 fathers (Pearson and Lee, 1903). 
Column (i) contains the sub-ranges, and column (2) the values of 
the centres of the sub-ranges in the arbitrary units, xl. Column (3) 
contains the frequencies (the 0-5 frequencies arise because when 
an observation falls exactly on the border-line between two groups, 


• (i-4) 
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TABLE i.s 


a) 

(8) 

(8) 

( 4 ) 

(6) 

(«) 

suture in 
Inches 

Arbitrary 

Units 

4 

Frequency 



ntpd ^ 

58-5- 

- 9 

3 

— 27 

243 

— 2 187 

59 * 5 - 

- 8 

3*5 

- 28 

224 

-1792 

6o*5~ 

“ 7 

S 

— 56 

392 

- 2 744 

61-5- 

- 6 

17 

— 102 

612 

— 3 672 

62*5- 

“ 5 

33-5 

— 167*5 

837-5 

- 4187-5 

63-5- 

“ 4 

61-5 

— 246 

984 

— 3 936 

64-5- 

- 3 

95-5 

— 286*5 

859-5 

- 2 578-5 

65-5- 

— 2 


— 284 

568 

— 1136 

66-s- 

— I 

137-5 

“ 137*5 

137-5 

137-5 

67-5- 

0 

IS 4 

'— 

— 

— 

68*5- 

I 

141-5 

141*5 

141-5 

141*5 

69-5- 

2 

II6 

232 

464 

92^8 

70-5- 

3 . 

78 

234 

702 

2 106 

71-5- 

4 

49 

196 

784 

3136 

72 - 5 - 

5 

28-5 

142-5 

712-5 

3 562-5 

73 * 5 “ 

6 

4 

24 

144 

864 

74-5- 

7 

5-5 

38-5 

269-5 

I 886*5 

Total 

, — 

I 078 

— 326 

8075 

- 9 746 


Vi = — 3 z6/i 078 = — 0-302 41 

^2 = 8 075/1 078 = 7-490 72 

— i/j* = — 0-691 45 


Vi = 7-39927 

— 0-083 33 


fJ ' i = 7-31594 

a = 2 - 704 8 inches 

i'3= — 9 746/1 078 = — 9-040 8 
— 3»'2*'i = + 6*795 8 


— 2-245 o 
-f 2vi® = — 0-055 3 


Vs = fit = — 2-300 3 
/3i=(—)o-oi3 513 
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' TABLE 

1 - 5 — 

■continued 


( 7 ) 

( 8 ) 

( 9 ) 

( 10 ) 

( 11 ) 

( 18 ) 

ntxl^ 

Xt — X 

w~ -- 

o' 

A 


nt 

(expected) 

z 


( 13 ) 


y—z 


19 683 
14336 

19 208 
22 032 

20 937*5 
15 744 

7735*5 

2 272 

I 37'5 

141*5 

I 856 
6 318 

12 544 
17 812-5 
5 184 

13205-5 


— 3*030 8 

0•001 22 

1*3 

1*3 

^ • n 

0 * 004 0 

1*6 

—2*66i0 

0*003 90 

4*2 

7*6 

17.7 

35*5 
62 * 8 

0*011 6 

4*6 

—2*291 3 

0-010 97 

II -8 

0 * 028 9 

II -5 

— 1-921 6 

0*027 33 

29*5 

0*063 0 

25*1 

— 1*551 9 

o*o6o 34 

65*0 

0-119 7 

47*7 

— 1*182.2 

o*ii8 56 

127*8 

V.* 

96-7 

0-198 3 

79*0 

—o*8i2 s 

o*2o8 25 

224*5 

0-286 8 

114*3 

—0*442 8 

0*328 96 

354*6 

130 X 
153-0 

157-1 

T jI T • 

0*3617 

144-2 

—0*073 I 

0*470 86 

507*6 

0*397 9 

158*6 

6*296 7 

o*6i6 65 

664*7 

0*3818 

152*2 

0*666 4 

0-74742 

805-7 

110*5 

75*7 

45*2 

23*7 

^ 10*8 

0*3195 

127*3 

1*036 I 

0 • 849 92 

gi6*2 

0*233 2 

92*9 

1-405 8 

0-920 10 

991*9 

0*148 5 

59*2 

1*775 5 

0*962 09 

I 037 'i 

0*082 5 

32*9 

2 * 14s 2 

0-984 03 

I 060*8 

0*040 0 

15*9 

2*5149 

2*8846 

0 * 994 04 
0*998 04 

I 071*6 

I 075*9 

J 6*4 

o*oi6 9 
0*006 2 

6*7 

2-5 


179147 — — — 1078-0 


v'l = 179 147/1 078 = 166-185 

— 4 V 3 vi = — 10-936 




155-249 

+ 6 = 

1 

+ 

4 - HO 

- 3 vi* = 


159*359 

—- 

0-025 

V4 = 


159*334 

II 

J? 

1 

- - 

3-700 



155-634 


+ 

0-029 


H = 155 ■ 663 

jSj = 2-9083 

mean — mode = — 0-170 i 

mean = 68-0 — 0-302 4 “ 67 * 6976 inches 

mode = 67 - 867 7 inches 
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a half is put into each), and columns (4), (5), (6) and (7) are obtained 
successively from the previous one by multiplying by the corre¬ 
sponding units in column (2). In column (4) 27 == 3 X — 9, in 

column (5) 243 ~ 27 X — 9, in column (6) — 2 187 = 243 X 9 

and in column (7) 19 683 = — X — 9, and so on for the other 
rows. The arithmetic may be checked by multiplying each frequency 
directly by and so checking column (7) term by term; if that 
is correct the other columns from which that was obtained must 
almost certainly be right. The sums of these columns (paying regard 
to sign) give the v coefficients, and these are corrected step by step 
below the table. The resulting moments are in inch-units already, 
since the sub-groups are i inch wide. The mean — mode is found 
from the formula on p. 34; = — 0-3024, and so the mean is 

0-3024 inch less than the arbitrary origin, which is 68-0 inches 
(the centre of the group at which — o). Hence mean height 
= 68-0 — 0-302 4 = 67-697 6 inches. We shall refer to columns 
(8) to (13) of Table 1.5 in the next chapter. - 
In this example, the constants have been calculated correct to 
several decimal places. This has been done advisedly, for when 
complicated computations are performed, errors due to “dropping 
figures” too soon are apt to accumulate and become very large, and 
it is always well to be on the safe side. The number of figures used 
in computing this example is near the minimum advisable. The 
final result may be “rounded off” to a few figures, if it is not going 
to be used in further calculations. 



'CHAPTER II 


DISTRIBUTIONS DERIVED FROM THEORY OF 

PROBABILITY 


Probability 

2 . 1 . There are events about which our knowledge is so complete 
that we are able to predict with complete certainty whether or not 
they will occur, and on the other hand there are events about which 
we know nothing, so that we are unable to make any prediction. 
An event of the first kind is the falling of a penny tossed into the 
air—^we are certain it will fall—and one of the second kind is the 
falling of the penny with the head (say) uppermost. Between these 
two extremes there are events about which we know something, but 
not enough to allow of certain prediction; these are the province 
of the theory of probability. The extent to which we can rely on a 
prediction varies in degree for different events depending on the 
amount of knowledge we have of the factors that determine the event, 
and a measure of the degree is called the probability of the event. 

Probability is expressed on a somewhat arbitrary scale of numbers 
between unity and zero, a value of i or o corresponding to certainty 
that the event will or will not occur, and intermediate values to 
intermediate degrees of certainty; a probability of 0*5 means that 
the event is as likely to occur as not. Defined in this vague way, 
there seems to be little objectivity about probability as a measure, 
but precision is achieved by giving the word a special meaning and 
concreteness by applying it to a restricted field.*^ These two steps 
result in mathematical and statistical views of probability. By 
extending the conceptions of these special fields to the events of 
ordinary life, the probability of any event is usually expressed as 
chances for or against the event, and the analogy makes the idea of 
probability fairly definite and concrete. We shall see how this arises 
in the next two sections. 

* A general calculus of probabilities has been developed for application 
to the general problem of expressing the relations between events and the 
amount of knowledge of the determining factors, or between propositions 
and the data on which they are based. The application of this calculus has 
not received universal acceptance, but this need not concern us, as we are 
only interested here in the mathematical and statistical applications that are 
almost, if not quite, universally accepted. 
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Mathematical Probability 

2 . 11 . This view of probability is based on a conception of a number 
of equally likely chances, some of which are fevourable to the 
occurrence of the event and some of which are unfavourable. The 
ratio of the favourable to the total chances is the probability of the 
event. This leads to a similar scale to that mentioned above, varying 
between zero and unity. It is usual and helpful to imagine an 
experiment like that of throwing a perfect six-sided die, all sides of 
which are equally likely to turn up. Only one side has a six, so the 
probability of turning up a six is Other idealised games of chance 
suggest other typical experiments:—drawing blindfold from a bag 
of well-mixed balls that differ only in colour, or drawing cards from 
a well-shuffled pack, or spinning a perfectly balanced roulette wheel. 

The calculus of mathematical probabilities is purely concerned 
with finding the probability of composite events knowing those of 
the simple components, and as such it is an exercise in permutations 
and combinations, enumerating the chances for and against the 
composite event. There are two fundamental rules that should be 
imderstood. 

Rule /.—The probability of occurrence of one or other of a 
number of independent events, only one of which can occur at a time, 
is the sum of the probabilities of the separate events. For example, 
it is easy to see that when throwing a die, two of the six equally 
likely chances favour the turning up of either a five or a six and 
that the probability is J J ~ 

Rule^II, —The probability of the simultaneous occurrence of a 
number of independent '^Ytnts is the product of their separate proba¬ 
bilities. Thus, when throwing two dice, there are 36 equally likely 
possibilities, and of these, only one is favourable to a double six; 
the probability of a double six is and this is X J. 

The word independent has its ordinary English meaning in the 
above rules. 

Sometimes the chances for and against an event are specified 
instead of the probability; a probability of ml{m + n) may be described 
by the statement “the chances are mton for the event.” 

Statistical Probability 

2 . 12 . The mathematical laws of probability find much use in the 
theory^ of statistics for calculating the relations between samples and 
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populations. Superficially statisticaf standpoint the 

drawing of a sample of men from the (population of 

England is very like drawin|^^ ^ 

that t e tec mque o samp # ^ made to satisfy the essential 
conditions for the app icatirf probability. Then, the 

occurrence o an even the drawing of an individual of a 

pven c aracter, t e set of chances becomes a popu¬ 

lation of equally likely idi^iduals, and a given combination of 
events ecomes a samp _ given composition. Samples which 
o ow e aws o probability are called random' 

samp eSy t ey ®a is y t e condition that every individual 

must e in ^P^^ il;s character must not be influenced in 

any way by the chamcten individual in the sample. 

-j ^ ^ statistic^ view, the probability of drawing an indi- 

vi ua 0 a given c ara^^^ proportion of individuals in the 

popu ation aving ^ ^^t^haracter; statistical probability is a propor¬ 
tionate requency. ^^ersely, any probability statement may be 
interpreted m those teCg^ 3^y probability of a 

given orse wirming d jg one-fifth, we may imagine a 

very large number or^ population of races run under the 

^^|-fifth of which races the given horse wins. 

. Tl^s interpretation (^£ probability is a little more concrete and so 
is a little more satis yirL pj-^otically than its most general description 
as a ratio expressing^ a confidence or knowledge, or a ratio 

of chances, ut it is quite satisfactory. A population is 

rarely known, even when ^ concrete thing as a bulk of 

corn that is being sample^^ j£ known, there would be no 

question of samp mg. Ajo^^yer^ it has been found in statistical 
experience that small £^^^ 

same bulk vary among 
^emselves considerab^^ g^^y jj^^crease in size they 

become more stable # j j^gg sporadic variations. It 

is assumed mat jf^QXidtncy continues indefinitely as the size of 
the sample increase^ there is a single limiting form to which 
a san^le a population tends as the sample increases in 

size. This linutmg jg ^rhat the statistician conceives of as the 

infinite populaUony ^^ more ordinary language may be described 
^ Ae re^lt ^t w€^jj^ obtained in the “long-run” of experience. 
The probability ^ given kind of individual is the proportion of 

* We are .%aling with instances in which the population is very 
large compared witS, 
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individuals ofnhat kind in this infinl^^ population. We do expect, 
in the lang-runl, that events having gi;^^ probabilities will occur and 
fail in nearlytne relative proportions by the probabilities. 

The above postulate of the stability proportionate frequencies 
in infinite samples cannot be proved, owing to the limitations 
of human patience and powers it is increase the size 

of an actual sample indefibnitely, but on it been based the whole 
theory of statistical sampling, the extensiv'^ which has failed 

to reveal any inconsistencies that throw dou'^^ basic postulate. 

This will be discussed again in section 2.4. ' 

The infinite population is seen to be dis*^^^^?' bulk that 

is being sampled, and is an abstraction in physical existence 

cannot be shown; it is dependent on the sanapling 

as well as on the bulk being sampled. Since' interest lies in the 
characteristics of the bulk, it is important arrange the 

sampling technique so that the infibnite the bulk 

sampled are substantially the same. ^ 

2 . 13 , The foregoing discussion of probability summed up 

roughly in the following terms. It is assume^ that in the long-run 
of experience, the proportions of occurrence^® different kinds of 
chance events will tend to have stable values defoe the infinite 

population: The long-run proportion of any l^d of event is 

its probability. For random sampling, the composite 

events may be calculated from those of siip*^^^ events by using the 
laws of mathematical probability. In pracf^?^^’ probability state¬ 
ments are interpreted in terms of propo.^^^^^^® frequencies in the 
long-run of experience, even when used"^*^ measure a degree of 
subjective confidence that the event will Thus, an event 

that happens freq^uently in the long-run hasV probability, and 

on any single occasion we have a considerab^ degree of confidence 
that an event with a high probability will hapd®”^* 

We shall now introduce some theoretical frek^^^^^ distributions 
that are derived from the laWs of mathematiP^ probability, and 
describe some of the applications to statistical 

V , ; 

. i i ^ , 

Binomial Distribution ' 

2 . 2 . If we had four perfect dice and threw 

the number of sixes that turned up, we she at diflFerenf 
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throws, four, three, two, one and zero sixes, and if we made enough 
throws we could form a frequency distribution in which the variate 
was the number of sixes per throw, or alternatively the number of 
'^not-sixes,’^ and the individuals were throws. This is the distribu¬ 
tion dealt with in this section. 

The imaginary experiment may be generalised by calling the casting 
of a six the occurrence of an event or a success, the casting of a number 
other than a six a non-occurrence or failure^ and the throw of four 


TABLE 2.1 


Number of 
Occurrences per 
Set 

Number of 
Non-occurrences 
per Set 

Proportion of Sets 

'i 

n 

0 


n — 1 

I 

ftpn-Ig 

n — X 

2 

pn-2^ 

n - z 

3 

n(M-i)(n-2) 

3 ! 

I 

> 

1 

- 


0 

n 

1 - 

Total 

• • m • 

« ■ * * ■ ■ X 


dice 2L set of n trials {n = 4). Then if the probability of a success is 
p and that of a failure is 5, -f- ? = i), in a set of n independent 

trials the probability of [n — s) successes and ^ failures is 

^ + V y. , . . (a.i) 

£ 1 

* This follows from the rules of mathematical probability given in 
Section 2.11. The probability that (w — s ') particular trials Will be successes 
and 5 will be failures is (Rule II). There are 

n{n — 1). . . (n — s + i) 

s\ 

ways in which (n — s ) successes and s failures may occur in a sample 
of n (this is a result from the theory of combinations), so from Rule I, the 
probability that any {n — s ) trials will be successes and any ^ will be 
failures is the expression (2.1). 
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The probabilities for different values of 5 are set out in Table 2.1 
in the form of a frequency distribution with proportionate fre¬ 
quencies, This is called the binomial frequency distribution because 
the proportionate frequencies are the terms in the expansion of the 
binomial [p + ff)^- For example, for the four perfect dice the dis¬ 
tribution of sixes is the expansion of the binomial + -1)^ and is 
shown in Table 2.2. The variate of this distribution is discrete, and 
it will be noted that all the proportionate frequencies may be 
described in terms of the two constants p (or q) and n. 

TABLE 2.2 


Niimber of Sixes .. 
Proportion of Throws 
Percentage of Throws 


432 

X ‘10 1 5 0 

T^g-iT Tr^ir 

o*o8 1*54 II-S 7 


I o Total 

5 0 o' 02/5^ T 

T3 9 T> 1 2 9 I 

38*58 48*23 100*00 


It is a matter of algebra to calculate the moments of the general 
distribution given in Table 2.1; they are: 

Mean Number of Occurrences^ I = np^ 


I 

n 


Second Moment^ = npq == Z( i 
whence 

Standard Deviationy <7 = V npq == l(^ 

Third Moment^ = npq{q “ Pi 


I 
n 

{i~Pf 


npq 


(2.2) 


Fourth Momenty == npq[i + 3(« — 2 )pq], 182 = — + 


3(m — 2) 


n 


It will be noted that the second and fourth moments are S3nn- 
metrical inp and q, i.e. the result is the same whether the occurrences 
or failures are the variate, and that for the third moment the inter- 
change of p and q merely alters the sign. 


Poisson Series 

2 , 3 . A binomial distribution may be imagined in which the proba¬ 
bility of a failure, q, is very small, that of a success, p, is nearly 
equal to unity, and the number of trials per set, n, is exceedingly 
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large, so that the mean number of failures per set, m = nq^ is of 
moderate dimensions. Then for any moderate value, 5 is negligible 
compared with n and (n — s) may be equated to n. Doing this in 
expression (2.1) the probability of s failures is 


nY 

si 




Y 

nj 


rrf 

71 


and the limit of this as n approaches infinity is 


e 



■ (2-3) 


where e is the exponential base = 2-718 . . . 

The expansion of this for different values of s is the Poisson Limit 
to the Binomial or the Poisson Series or the Law of Small Numbers. 

This distribution is defined entirely by the one constant wz, which 
is the mean. The other moments are given by the limits to equa¬ 
tions (2.2) as n approaches infinity, and of these it is only important 
to note that the second moment, which may be written (i — mln)m, 
equals the mean. 

To calculate the terms of the series for a given value of either 
the expression (2.3) may be evaluated, with the aid of logarithms, 
for ^ = o, 5 = I, ^=:2 and so on, or Soper’s tables in Pearson’s 
collection of tables (1931) may be used. These tables give the values 
of expression (2.3) to six decimal places for values of m between 
m = o-i and m = 15*0, and witfiin this range, for all values of s 
that have any proportionate frequency. 


Use of the Binomial and Poisson Distributions for Testing 
Randomness 

2 . 4 . One of the most elementary and direct experimental tests of 
the assumption of an infinite population and of the applicability 
of the laws of mathematical probability to actual experience consists 
in seeing if the binomial distribution describes the variations in the 
numbers of successes in sets of trials. Dice have been thrown by 
different experimenters many thousands of times in the aggregate, 
and the resulting frequency distributions have been compared with 
the corresponding theoretical binomial distributions. Other experi¬ 
ments have taken the form of tossing coins, and thousands of runs 
of roulette wheels in casinos have been observed for this purpose. 

D 
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SO 

It cannot be said, however, that these experiments have either proved 
or disproved the laws of probability, for when there has been a 
discrepancy between theory and experiment, it has usually been 
possible to find reasons why the particular experiment failed and 
the general applicability of the laws of probability has remained 
unquestioned. We believe that these laws are no more to be proved 
or disproved experimentally than are (say) the terms in a Fourier 
analysis of a series. They are statements of mathematical relation¬ 
ships that are useful in analysing statistical sampling experience, and 
do in fact describe an important element in that experience. The 
justification for the laws is that they are useful in this way. This is 
the attitude in which the analyses of this section are regarded. 

In many collections of statistical data, individuals are of two kinds 


TABLE 2.3 


Number of Seeds 


Frequency of Rows 

Germinated per Row 

Actual 

Expected 

0 

6 

6*9 

I 

20 

I 9 'i 

Z 

28 

24’0 

3 

12 

17-7 

4 

8 

8-6 

5 

6 

2*9 

6 

— 

0*7 

7 


0*1 

Total .. .. 

80 

80’O 


and divided into sets; the binomial distribution may be used to see 
if, for any particular collection, the individuals are independent and 
the two Hnds occur at random in the sets. A special experiment to 
illustrate this was conducted by incubating 800 cabbage seeds on 
filter paper in rows of ten, and after eight days the number of 
germinated seeds in each row was counted. These data were formed 
into a frequency distribution in which the variate was the number 
of germinated seeds per row and the individuals were rows; this is 
given in the second column of Table 2.3. If the germinated seeds 
were distributed at random among the rows, this distribution would 
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be expected to be a binomial with n = 10. We do not know the 
probability ^ of a single seed germinating, hut may estimate it 
from the relationship between the mean Z, n and p given in 
equations (2.2). The mean number of germinated seeds per row 
is (0 X 6 + I X 20 + * . . 5 X 6) 80 = 2’ 175, and the estimate 

of p is therefore 2*175-r 10 = 0*2175. The proportionate fre¬ 
quencies of the expected binomial distribution are given by the 
expansion of (0*2175 + 0*7825)^0, and these multiplied by 80 are 
the expected frequencies of Table 2.3. The agreement between the 
actual and expected frequencies is quite good,* and as far as may 
be judged from this limited experience, the variations in genninated 
seeds from row to row were random; the seeds were well mixed 
and independent and conditions were uniform, so that there was a 
constant probability of any one germinating. 

It may be objected that since the expected distribution was 
derived from the actual by choosing p to m^e the two means equal, 
the closeness of agreement signifies nothing. This objection is not 
valid, however, since the two distributions have only been made to 
agree in two respects, mean and total frequency, and they have not 
been made to agree in form; that agreement is a consequence of 
randomness. 

Another test of randomness is that the various moments should 
bear the same relations to each other as those of the theoretical 
binomial distribution given in expression (2.2). The most important 
relationship is that between the second moment and mean. The 
second moment or variance of the actual distribution in Table 2.3, 
calculated by the technique of paragraphs 1.31 and 1.32, is 1*744, 

• and Z(i — Ifri), which may be termed the expected variance, is 
1-702; again the agreement is good. 

The almost classical example of a Poisson distribution is that given 
by coxmts of yeast cells in the squares of a haemacytometer. The 
liquid in which the yeast cells are suspended can be regarded as 
consisting of aggregates of molecules of the liquid about equal in 
size to the yeast cells which are sparsely distributed among them. 
Then the probability that any aggregate taken at random is a yeast 
cell (the aggregate being a trial and the yeast cell a failure in 
the language of the theory) is extremely small; but there are very 
many such aggregates in the liquid under one square in the haema- 

* A criterion for judging the closeness of agreement between two 
distributions will be given in Chapter IV. 
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cjrtometer (i.e, in the that the mean number of yeast cells 

per square is finite. Consequently, if the cells are distributed inde¬ 
pendently and at random through the suspending liquid, the fre¬ 
quency distribution of number per square should be the Poisson 
Series. Table 3.4 gives the distribution of the counts in 400 cells 
found by ‘"Student” (1907). The mean number of cells per square 
is 4*68, and from Soper’s tables the Poisson Series having a mean 


TABLE 2.4 


Number of Cells 

Frequency of Squares 

per Square 

Observed Expected. 


0 


3'71 

I 

20 

17-37 

3 

43 

, 40-65 

3 

53 

63-41 

4 

86 

74-19 

5 

70 

69-44 

6 

54 

S4 -i 6 

7 

37 

36*21 

8 

18 

21 * 18 

9 

10 

11*02 

10 

5 

5-16 

II 

2 

2*19 

12 

2 

0*86 

13 

— 

0*31 

14 

—— 

0 

• 

0 

15 

— 

0*03 

16 

— 

0*01 

— 

400 

400•00 


(m) of the same value is constructed, the separate terms being multi¬ 
plied by 400 to give the expected frequencies of the above table. 
Thus, for ?« = 4*6, o-oio 053 of the squares should have zero cells 
and for ^ = 4-7 this frequency is 0-009095; hence by linear 
interpolation, for tk = 4-68 it is o*oio 053 — o-8 X o-ooo 957- 
= o * 009 286, and this mifitiphed by 400 gives the expected frequency 
of 3*71. The agreement between the two distributions is quite 
good. The second moment of the observed distribution is 4*46, 
and is nearly equal to the mean. 
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In general, the Poisson distribution would be expected to apply 
where events occur at random and under constant conditions in a 
medixim (usually space or time) that may be divided into a number 
of equal zones, provided that relative to the size of the zone the 
medium is continuous (i.e. there is a very large number of elemental 
units of medium per zone) and the events are rare ‘'points,^’ i.e. 
there is room in each zone for an exceedingly large number of events 
although the actual number in any zone is moderate. Examples are 
the distribution of microscopic and ultra-microscopic particles and 
bacteria in liquids, the numbers of a-particles emitted from radio¬ 
active substances in intervals of time, and counts of weeds or pests 
in given areas of land in agricultural field trials. 

When the binomial and Poisson distributions fit any given experi¬ 
mental data, it is inferred that the variations are due to chance and 
cannot be reduced or controlled unless the whole character of the 
data can be altered by introducing some method of selecting indi¬ 
viduals of a given kind. It does not always happen, however, that 
there is good agreement between theory and experiment. Where 
there are discrepancies, the plain fact is that the incidence of the 
event is not random, but the matter is not often allowed to rest 
there. The investigator usually tries to find the cause of the lack 
of randomness, and this extends beyond the realm of statistics. 
Sometimes the cause may be some fault in sampling technique that 
may be corrected; sometimes the search for the cause may lead to 
a discovery of scientific value. A frequent kind of discrepancy is that 
in which the actual variance is greater than the expected, and this 
is usually interpreted by assuming that conditions are not uniform, 
regarding the probability of the event as varying from one set of 
trials or zone to another. Then the actual distribution is composed 
of several binomial or Poisson distributions superimposed, and the 
actual variance is equal to the expected variance plus that due to 
the fluctuations in the probabilities. In such instances, the experience 
is not merely dismissed as being non-random, but the variation is 
analysed into two parts, systematic and random. The principles of 
such analyses will be dealt with in Chapter VI, 

There are kinds of discrepancy other than that mentioned, 
and these lead to other analyses, all of which include a random 
element. 

Readers who are interested in further examples of the uses of the 
binomial and Poisson distributions are referred to the following 
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references: Cochran (1936), Fisher (1936^), Przyborowski and 
Wilenski (1935), and Tippett (1934). 


Normal Distribution . 

2 . 5 . The Normal distribution may be derived mathematically as 
another limiting form of the binomial that is approached as n 
becomes very large, both p and q remaining finite. A binomial dis¬ 
tribution may be represented by a histogram with each group centred 
over the value of the variate corresponding to the number of occur¬ 
rences per set; there are {n-\- i) groups and the outline of the 
diagram is of the characteristic stepped form. As tz, and hence the 
number of groups, increases, it becomes necessary to reduce the 
scale of the variate to keep the diagram within reasonable dimen¬ 
sions and so the steps in the outline become smaller. If this process 
continues indefinitely, it may easily be imagined that in the limit 
the steps in the outline coalesce to form a smooth curve; this is the 
frequency curve of the Normal or Gatissian distribution. Since it 
is a continuous curve, it necessarily has a continuous variate. Its 
equation may be written 


N 

y = -—r-e 
'V^Tra 

where e is the exponential base — a-yiS 28 . . . , ^ is the variate, 
and Ny m and cr are constants. It will be seen later why the equation 

is written in this particular form and the and — ^ are not 

incorporated in the constants. The derivation of this equation from 
the binomial is purely an algebraic process. 

This curve is that given in Fig. 2 for = o, jSg = 3 and in 
Fig. 3. It extends between x = ^ co and jc == — 00, since only at 
those extremes does y = o, and it is symmetrical about an ordinate 
at ^ i.e. about the ordinate at O in Fig. 3, 

This frequency curve is deduced from a histogram and conse¬ 
quently areas under the curve and not heights of ordinates represent 
frequencies. It is therefore appropriate to use the notation and ideas 
of the integral calculus, to imagine about any given value of x an 
elemental sub-range dXy and to regard the area under the curve 
between ordinates drawn at the limits of the sub-range as an element 
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of frequency, df (say). Then, this elemental strip may be regarded 
as a rectangle of height y and 

df = ydx = —p=-e dx . . . . (2.4) 

V 277 O’ 

It is now possible to find the various moments of (2.4). The 
total area under the curve is the total frequency, and integrating 
(2.4) between the limits of = d: 00 this is found to be N. Hence 
N in (2.4) is the total frequency. The other moments may be found 



Fig. 3. 

from equation (1.3), p. 38, substituting df for 72^, the frequency in 
the sub-group, and integrating instead of summing. Thus the mean is 

r 

xdf=m 

V —OQ 

so that the constant m is the mean of the distribution. Similarly 
the other moments may be found, and the first four are: 

mean — m 
^^2 = 1 = 

~ 3 A*' 2 ^j ^2 ~ 3 - 

These expressions for the mean and variance cannot be deduced 
from those given in (2.2) for the binomial distribution by writing 
71 = 00, for they would both become infinite; this is balanced in 
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the derivation of the curve by the ultimate reduction in the scale 
of the variate, referred to above. The ratios and jSg, however, are 
pure numbers, and by putting n = 00 in (z.z) they reduce to the 
values given above for the normal curve. 


Determination of Normal Frequencies 

2 . 51 . Apart from the total frequency, AT, any given normal distri¬ 
bution is completely characterised by the two constants m and o. 
We are interested in calculating proportionate frequencies between 
various limits of x. This could be done by integrating (2.4) between 
the limits if it were possible to evaluate the integral, but as it is 
not, ordinates must be calculated, and the integration performed by 
quadrature or by some graphical means. 

For example, in Fig. 3 the proportionate frequency having values 
less than a value denoted by G on the variate scale is the area under 
the curve to the left of GH^ expressed as a fraction of the total area 
under the curve; the proportion between G and E is the area between 
GH and EF', the proportion greater than E is the area to the right 
of EF. This process of integration by quadrature is possible, but 
laborious. However, tables have been calculated, and these reduce 
the labour considerably. 

A complete set of tables for a range of values of m and a would 
be enormous, but by performing a mathematical transformation all 
normal curves reduce to a standard form. The transformation con¬ 
sists in measuring the variate as a deviation from the mean, divided 
by the standard deviation. If we write 


'*7l> 




then for a sample of i individual (2.4) becomes 


df— —= zdw .... (2.6) 

V 277 


This may be termed the standardised form of the normal curve, 
and has a mean of zero and a standard deviation of unity. The 
probability integral at w is the area under the curve to the left of 
an ordinate drawn at w and may be written 



'w 


V 


00 
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^*dw. 
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Very complete tables of and z for positive equally spaced values 
of w have been calculated by Sheppard and are included in Tables 
for Statisticians and Biometricians, VoL I (Pearson, 1931), where 
our w is called x and our is called |(i + a).* To find the 
probability integral for a negative value of w, use is made of the 
fact that the distribution (2.6) is symmetrical about an ordinate at 
zc; = o. If is the integral at — and is the integral at a?, 
it is easy to see that 

r ' •••••• (2*7) 

There are also, tables that give values of w for equally spaced values 
of while others give zo for equally spaced values of 2(1 — A^), 
If OE in Fig. 3 represents a deviation + w and OG a deviation — zv 
(OE = OG and since O is at the centre of symmetry it represents 

= o) the area to the left of EF is A^ and the “taiF’ to the right 
is (i — A^. Similarly, by symmetry the area of the “taiF’ to the 
left of HG is (i — A^ and so 2(1 — A^ is the sum of the two 
‘‘tails” beyond the deviations + and ■— w. The probability 
integral is sometimes tabulated in this form because it is of interest 
in sampling theory; this is the method of presentation used by 
Fisher (1936a). 

To find the probability integral for any value of an actual variate 
{x in our notation) it is transformed to w by equation (2.5) and the 
corresponding integral is that required. Thus, if = 67*6976 
inches and 0 = 2*7048 inches and it is required to find the 
probability integral at x = 60*5 inches say, then 


60-5-67-6976 
2-7048 


— 2‘66io 


From Sheppard’s tables the value of the integral at a deviation of 
2*66 is 0*996 09 and at 2*67 it is 0*996 21; so the first difference is 
o • 000 12, and by linear interpolation the integral for a? = + 2 • 661 o is 


A^ = 0*996 09 + 0*000 12 X o-io — 0*996 10. 

Hence, from equation (2.7), the integral for w= — 2*6610 is 
0*003 90. 

The proportionate frequency between any two values of the 
variate may be foimd by taking the difference of the two corre- 

* In this book we shall continue to use our own notation, even when 
referring to these tables. 
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spending integrals. Thus, in the above example the integral 
corresponding to ^ — 59-5 is found to be o-ooi zz and the pro¬ 
portionate frequency between 59*5 and 60*5 inches is 0-00390 

— O-OOI ZZ = 0-002 68. 

A normal frequency distribution may be ‘‘fitted’’ to an actual 
distribution by putting m and a equal to the computed mean and 
standard deviation respectively and then finding the frequencies in 
the sub-groups from the normal probability integrals. The propor¬ 
tionate frequency calculated in the above example is for the second 
group of the distribution of heights of fathers in Table 1.5 and 
has been calculated in this way; the process is completed in the 
later columns of that table. Column (8) gives values of w corre¬ 
sponding to the limits of the sub-ranges, column (9) gives proba¬ 
bility integrals, in column (10) these are converted to frequencies 
by multiplying by the total N — j 078, and the normal or “expected” 
frequencies in column (ii) are the differences of the values in the 
previous column. These may be compared with the actual fre¬ 
quencies, in column (3). In order to plot the curve we must 
find the ordinates. Sheppard’s tables give values of the ordinates 
of the standardised curve corresponding to the deviations m (see 
equation 2.6), and these are given in column (12) of Table 1.5. The 
ordinates of the actual curve are obtained by the transformation 


I 



398-55 


and these are in column (13). In Fig. 4 the curve drawn from these 
ordinates is superimposed on the histogram. 


Sometimes a frequency or proportionate frequency is given, and it 
is desired to find the corresponding value of the variate. The value 
of the standardised variate w may be found from Sheppard’s tables, 
and hence, knowing m and cr, the actual variate be calculated from 
equation (2.5). 

For example, suppose it is required to know for the data of 
Table 1.5 the limit of height such that 20 per cent of the fathers 
are shorter than the limit and 80 per cent are higher, assuming the 
distribution to be normal with mean and standard deviation equal 
to the values already computed. Then A = o-z, and since this is 
less than 0*5 it corresponds to a negative value of zv; we must 
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therefore first fold w corresponding to ^ = i — o*3 = o*8. From 
Sheppard’s tables, the value of the variate at which A = 0-799 546 
is 0*84 and the value at which A — 0-802 338 is 0-85. Hence, by 
linear interpolation, the value at which ^ == o - 8 is 


w 


0 • 8 — o • 799 546 
0-802 338 — 0*799 546 


X o-oi + 0*84 = 0-841 63. 


Hence the value of w at which A ~ o-z is — 0-841 63 and if this 
is substituted in (2.5) the limit of height is found to be 65 ■ 421 inches. 



Relation between Normal Frequencies and Standard Deviation 

3 . 52 . We cannot here give a full table of the probability integral 
of the normal distribution, but a few important values are given in 
Table 2.5. The first entry in this table is obvious; the ordinate at 
= 0 is the axis of symmetry of the curve, and the area up to this 
ordinate is half the total area. 

The second entry of the variate is the quartile deviation; the 
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proportionate frequency between positive and negative values of 
this deviation is half the total. The ordinates GH and EF drawn in 
Fig. 3 are at values of the variate corresponding to zv ^ + z-o and 
— 2*0, and the proportionate area of the two ‘‘tails” beyond these 
limits is 0-04550; i.e. about 5 per cent of the individuals in a 
normal population deviate from the mean by twice the standard 
4 eviation or more. Similarly, from the last entry in Table 2.5 we 
see that 99*73 per cent of the individuals are contained within a 
total range of six times the standard deviation. These data will 
assist readers in appreciating the significance of the standard 
deviation as a measure of variability when applied to a distribution 
that is approximately normal. 

The proportional relations between the standard deviation and 

TABLE 2.5 
Normal Distribution 


Vaxiate 

w 

Probability lategral 

Sum of Two “Tails” 

0 

0 -500 00 

I■000 00 

0-67449 

0*750 00 

0■500 00 

10 

0-841 34 

0-317 31 

2-0 

0-977 25 

0*045 50 

2-6 

0*995 34 

0*009 32 

3*0 

0-998 65 

0■002 70 


other measures of dispersion given above in section 1.23 are true 
only for normal distributions, although they are roughly applicable 
also to many distributions that are approximately normal. 

Practical Applicability of the Normal Distribution 

2 . 53 . The Normal distribution is a continuous curve, and first we 
must discuss the applicability of frequency curves in general to 
practical data. 

It is assumed that for most infinite populations with a continuous 
variate, the frequency distribution may be represented by a con¬ 
tinuous frequency curve. This is an extrapolation of the practical 
experience that as the size of the sample is increased, the sub-ranges 
of the distributmn may be reduced and the outline of the histogram 



DISTRIBUTIONS FROM THEORY OF PROBABILITY 6i 

usually becomes more regular. Where a form of curve can be assumed, 
the parameters may be estimated from a large sample, and a curve 
be fitted to the actual distribution, as has been done for Table 1.5. 

The normal curve has also been deduced mathematically as the 
distribution that results from the combination of an infinite number 
of very small random errors, and this fact together with the fact 
that the distribution is a special case of the binomial gives it a 
fundamental status. It is believed by some to represent the devia¬ 
tions due to experimental errors in measurements of a physical 
constant, and a quantity that is distributed according to this law is 
sometimes held to be subject only to chance causes that cannot be 
controlled. Conversely, where a distribution is skew, it is often 
inferred that superimposed on the chance variations are some larger 
ones due to a few important causes that may be controlled. We are 
dubious of this use of the normal distribution. Doubtless normality 
may be made a definition, and hence a test, of “randomness” just 
in the way that conformity to the binomial and Poisson distributions 
has been interpreted, but it is doubtful if for the normal curve such 
a course has any practical significance. Frequently, when man has 
done all he can to control the variation in a character and the 
remaining variations are practically random, the resulting distribu¬ 
tion is far from normal. Experimental errors in physical measure¬ 
ments are by no means always normally distributed, and some 
quantities like the strength of elements of various natural and 
manufactured materials have essentially skew frequency distribu¬ 
tions. It may also happen that the distribution of a quantity is to 
all intents and purposes normal, and yet a further degree of control 
may be possible. On the whole, we doubt if the normal distribution 
is of more than empirical value; it is a convenient means of describing 
any data it happens to fit. 

In presenting this view of the place occupied by the normal curve 
in the statistical scheme, it would be a mistake to underestimate its 
importance. The curve does in fact represent closely a very large 
number of experimental distributions and approximately represents 
many more. It is important, too, because it is the distribution on 
which most of the statistical theory of errors is based. 

Non-normal Curves 

* 

2 . 6 . There are several systems of smooth curves for describing 
data which do not follow the normal law, of which Pearson’s is 
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probably the most useful; all the curves in Fig. 2 are of this system. 
The type is decided upon from the values of the two j8 ratios, and 
then these, together with the standard deviation and mean, are used 
to find the unknown parameters in the equation. It is not often 
in practical work, however, that any useful purpose is served by 
fitting such curves (their chief use is in sampling theory and in 
actuarial work), so we will merely refer the interested reader to some 
such book as Elderton^s Frequency Curves and Correlation, 

Sampling Distributions 

2 . 7 . A sampling distribution results if a large number of finite 
random samples are taken from an infinite population, some single 


TABLE 2.6 

Frequency Distributions 



27- 

30- 

33- 

36- 

39- 

42- 

45“ 

48” 51- 

Individual observations .. 

4 

2 

I 

3 

7 

7 

10 

17 13 

Means of 5 






3 , 

4 

5 2 



54“ 

57- 

60- 

63- 

66- 

69- 

72- 

Total 


13 

8 

7 

3 

I 

3 

I 

100 


5 

I 






20 


characteristic of each sample is computed and the computed values 
are formed into a frequency distribution. For example, the observa¬ 
tions of Table 1.3 may be regarded as forming twenty samples of 
five observations from one population. The means of these samples 
are given in Table 1.3, and are formed into a frequency distribution 
in Table 2.6. This table also gives the distribution of the individual 
observations. 

The distribution of means is called sampling distribution of the 
mean. Like any other frequency distribution it has a mean, and a 
standard deviation. The standard deviation of such a distribution 
is called the standard error of the mean. 

Other constants of the samples of five could have been calculated, 
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for example the standard deviations or the medians. Each of these 
would have had its own sampling distribution and standard error. 

Most sampling distributions in common use may be deduced 
mathematically by applying the laws of probability, but given 
sufficient time and energy one could determine them experimentally 
by making up an artificial population (say) by writing numbers on 
cards, shaking up in a bag, drawing samples and calculating the 
means or other statistical constants much in the way in which 
Table 2.6 was constructed from Table 1.3, except that the scale of 
the experiment would need to be much larger. This kind of tech~ 
nique is actually used when the distribution cannot be deduced 
mathematically. 

Sampling Distribution of Mean 

2 . 71 . We are particularly concerned here with the sampling distri¬ 
bution of the mean in samples of size N drawn from an infinite 
population in which the individuals are normally distributed. This 
is itself a normal distribution with a mean equal to the mean of the 
individuals in the population and a standard error of 

O’ 

where o is the standard deviation of the individuals. 

It will be noticed that as N increases, the standard error of the 
mean decreases. This is shown in Fig. 5, where there are given the 
sampling distributions of the means of samples of i, 4, 16, 25 
and 100 from a population in which the individuals are normally 
distributed with an unspecified standard deviation. This population 
distribution is that in Fig. 5 for N = 1. For the larger samples, 
there is less dispersion about the population mean. This is the 
statistical demonstration of the common experience that a large 
sample gives a more accurate representation of the population than 
a small one; the increase in precision is measured by a reduction 
in standard error. 

The normal probability integral may be used to calculate propor¬ 
tionate frequencies of samples having means within given limits, 
and Table 2.5 may be used by reading ‘^deviation of sample mean 
from population mean, divided by standard error” for variate 
Thus only o • 045 50 or about i in 20 of the possible samples have 
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means that differ from the population value by more than twice the 
standard error. 

Deduction of Sampling Distributions of Mean and Standard Deviation 

2 . 72 . The sampling distributions of both the mean and standard 
deviation may be deduced together. The proof is taken from a 
paper by Irwin (1931). 


N'dOO 


m 



Let the population mean be | and the other particulars as stated 
above. Also let the variate be x and the values in any one sample 
of AT be ;x;2 • • • ^5 • • • Since the population distribution is 
continuous, we cannot state what is the probability of a value a:,, 
for an ordinate of the frequency curve drawn at has no area, but 
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the probability of a Value lying within an elemental range dx^ about 
a value is 

—2 dx^. 

^ Ztt a 

Applying the second rule of probability, the probability of a sample 
having values lying within ranges dx^ dx^ . . , oi x-^x^ . . . which 
we may shortly describe as “the probability of the sample,” is 

-— e~^ o-a dx^dx^ . . . dxj^y or 

(27r)Tcr^ 

- ^dx^dx^ . - - dx^. 

( 277)1 cr^ 

Now if X and s are the mean and standard deviation in the sample,^ 
according to sections 1.23 and 1.31, 

+ ^2 + • • • % “ + • • • === 

Substituting these in the exponent of the above expression, the 
probability of the sample is 

I — 52 4-(^— 1)3 

- —e ^ dxidx^ . . . dx^, 

(Z7r)To^ 

Many of the possible samples from this population will have a 
mean value x and a standard deviation jt, and to find the probability 
of a sample with these two constants lying within ranges dx and ds 
of X and 5 we must apply the first probability rule and integrate 
the above expression for all samples having these two constants. 
This is a mathematical step involving a transformation to polar 
co-ordinates in JV-dimensional space and a subsequent integration, 
and leads to the following expression for the probability of a mean 
of X and a standard deviation of r: 

iV(.r-f)2 

df = ^ e ^dxds 

* For the first time it is necessary to distinguish clearly between the 
population value of a statistical constant and the value estimated from a 
sample. Usually we shall denote population values by Greek letters and the 

sample values by corresponding italic letters. This is why we have used ^ 
instead of wi in the equation for the normal distribution. 
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where JST is a constant. This is the probability or proportionate 
frequency distribution of the two statistical constants. Since the 
term containing x does not contain and vice-versa^ the converse 
of the second rule of probability implies that x and s have inde¬ 
pendent probabilities, and the two distributions may be written 
separately: 


and 


__N 

df=K^e ^ dx 



( 2 . 8 ) 


iVs2 I 

^ df ^ K^~^e ^<^^ds,\ 

By integrating the first expression over the whole range and equating 
the result to unity, the constant is found to be 





(T 


The distribution of the mean is a normal one with mean equal 

to I and standard deviation equal to <j/VN. That of the standard 
deviation is more complicated. 



CHAPTER III 


ERRORS OF RANDOM SAMPLING AND STATISTICAL 

INFERENCE 

Much statistical work is concerned with attempts to learn some¬ 
thing of the characteristics of populations from measurements made 
on fimte representative samples. It is a fact of experience that suc¬ 
cessive samples from one population usually differ amongst them¬ 
selves, and therefore in general they differ from the population. 
They are subject to the so-called errors of sampling that bring 
uncertainty into any inferences that may be made regarding the 
population. This chapter is concerned with the theory of statistical 
inference, taking account of these errors; the applications of the 
theory are described for large samples only. 

Random and Representative Samples 

3 . 1 . We shall deal only with simple random samples, in which, as 
stated in section 2.13, every individual is independent of every 
other. A sample of i 000 men consisting of 500 pairs of brothers is 
not random, for there is a tendency for brothers to be alike and 
there are only 500 independent individuals; the others are related, 
statistically as well as by blood, to the first 500. Special precautions 
are necessary in practice to obtain satisfactory samples; bales of 
cotton, sacks of corn and the like are seldom uniform, and a small 
quantity taken from one place is not representative of^ the whole. 
In such circumstances, it is necessary to select the individuals one 
at a time, but where the bulk is well mixed and statistically homo¬ 
geneous, a group of individuals taken from any part is effectively a 
random sample. 

The theory of sampling describes the relations between samples 
and the infinite population, and this last will differ from the actual 
population or bulk if the sample is biased. In practice, it is the 
• bulk that is of interest, and it is desirable to make the sample a 
representative one by seeing that every individual in the bulk has 
an equal chance of being included. This is not always easy. For 
example, when taking cotton hairs singly from a bale, there is an un¬ 
avoidable tendency to take too many of the longer hairs. The method 
of drawing individuals must be independent of their character. 
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The degree of elaboration necessary in a sampling technique to 
ensure randomness and representativeness depends on the nature 
of the material being sampled, and the arrangement of a technique 
is more a matter for the experimentalist than for the statistician. 
Readers are advised, however, that investigation in particular fields 
has often shown that much more care in sampling is required than 
appeared to be necessary at first sight. Although it is impossible to 
give general rules, the following description of two particular 
sampling methods may be useful as illustrations. 

The first method is that applied to sampling cotton hairs. Hairs 
are taken from different parts of the bale, not singly but in tufts 
sufficiently large to avoid the bias towards length. Each tuft is then 
divided roughly into two halves and one half is discarded. The 
other is halved again, one half is discarded and the other is again 
divided. This is repeated until the final reduced tuft contains only 
a few hairs. A number of such reduced tufts is combined to form 
the final sample, every hair of which is measured. If at each division 
the half to be discarded is decided by lot, say by the toss of a coin, 
the final sample will be truly representative. But it will not be a 
simple random sample unless the final reduced tufts contain only 
one hair each. However, we shall show how to treat such complex 
samples in section lo.ii. 

The second method may often be applied when the population 
to be sampled is large but finite. Suppose we wish to inspect a 
random sample of houses in a large town. The town may be divided 
into wards, and the wards into streets. One ward may be chosen at 
random, a street may be chosen at random from those in the ward, 
and a house from those in a street; this house is a randomly chosen 
individual, and if the whole process is repeated several times, the 
resulting houses are a random sample of the town, provided the 
wards and streets are approximately equal in size. If the wards, the 
streets within each ward, and the houses in each street sgre all 
systematically numbered, the problem of random selection resolves 
itself into one of choosing a number at random. This process is 
facilitated by Random Sampling Numbers (Tippett, 1927), which 
is a collection of 40,000 numbers from o to 9, arranged in random 
order. By combining an appropriate number of digits taken any¬ 
where from this collection, random numbers of almost any magnitude 
may readily be obtained. 

It should be noted that if arrangements are made to represent 
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each ward equally in the sample or to see that no two houses are 
in the same street, the sample is representative but not random. 
Such may be called a stratified sample, and its random errors are 
dealt with in section 10.13. 

The theory of random sampling is sometimes applied to experi¬ 
mental errors in physical determinations. It is not unreasonable to 
regard the limited number of measurements that may actually be 
made of a physical quantity as a random sample of an infinite 
population of measurements that could conceivably be made under 
the same conditions. The average of this population, however, is 
not necessarily the true value of the quantity; it is affected by errors 
of bias due to the particular conditions of the experiment, to such 
factors as the personal error of the observer and idiosyncrasies of 
the apparatus. Since in most physical experiments errors of bias 
are liable to be of the same order of magnitude as random errors, 
the theory of random sampling is of very limited utility in this field. 

The qualities of randomness and representativeness do not depend 
on the size of the sample; a sample of five, if properly chosen, is 
as random and representative as one of five thousand. In this chapter 
we are limiting, not the general theory, but the application of the 
theory to large samples, i.e. samples of at least a few hundreds, and 
this is done to simplify matters. The more exact theory for small 
samples will be given in Chapter V. 

We shall again make the further assumption that the population 
sampled is infinite in the sense that its composition is not appre¬ 
ciably altered by the abstraction of the sample. This is satisfied, 
practically, if the individuals are drawn one at a time from even a 
very small bulk provided they are returned to the bulk and well 
mixed with the other individuals between each draw. 

Tests of Significance 

3 . 2 . Many scientific investigations involve the employment of the 
method of framing working hypotheses and testing them experi¬ 
mentally. As long as the experiments fail to disprove them, so long 
are the hypotheses accepted. This is the general method by which 
many statistical inferences are made. A hypothetical population of 
certain characteristics is postulated, and if the sample is such that 
it could reasonably have come from that population, the hypothesis 
is accepted. Owing to sampling errors, however, there is no sharp 
dividing line between samples that could have come from the 
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hypothetical population and those that could not. It is only possible 
to give a probability that a sample like the one observed could have 
come from the population. If the probability is low, the hypothesis 
is rejected; if it is high, the hypothesis is accepted and the deviation 
between the sample and postulated population is attributed to 
errors of sampling. The sample may be characterised by one or more 
statistical constants, and in the next paragraph the process will be 
illustrated for the mean of samples. 

If there is some ground for assuming* the population to be normal 
with a given mean and st^dard deviation, the distribution of the 
means of samples of any given size is the sampling distribution 
described in section 2.71, and from the probability integral, the 
probability of a sample mean deviating from the population mean 
by more than any value may be deduced. For example, we see from 
Table 2.5 that the probability of a sample mean differing from the 
population mean by plus or minus three or more times the standard 
error is 0*002 70, This is low, and if any actual sample does differ 
in mean from the population value by more than this amount,, 
we infer that the deviation may not reasonably be attributed to 
random errors and the hypothesis regarding the population has 
to be abandoned. 

The probability of a random deviation exceeding any given value 
is called the level of significance of the deviation, and is often 
expressed as a percentage. For example, a deviation of dz 3 times 
the standard error is on the 0*27 per cent, level of significance and 
one of I - o times the standard error is on the 32 per cent, level. 

By way of example we will assume the lengths of 4 000 hairs of 
an Indian cotton given by Koshal and Turner (1950) to be an 
infinite population.f Their mean length is 2 * 33 cm. and the standard 
deviation is 0*480 6 cm. The first thousand hairs were selected by 
a different method from the rest and gave a mean of 2*54 cm. Is 
this deviation compatible with the hypothesis that the i 000 are a 
random sample from the 4 000 and that the difference in means is 
due to random errors, or is the difference large enough to indicate 

* There is no fundamental distinction between an hypothesis and the 
assumptions (e.g. normality). However, a good test is sensitive to falseness' 
in the hypothesis, which is what we want to test, and comparatively 
insensitive to errors in the assumptions, which are subsidiary. 

t This assumption is an approximation, for 4 000 is not infinitely large 
compared with i 000, the size of the sample we are testing. 
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that the change in technique has had an effect? The standard error ■ 
of the mean is 


0-480 6 
VI 000 


0-015 3, 


and the deviation of o-2i cm. is over 13 times the standard error. 
Sheppard’s normal probability tables show that this is far beyond 
the 0-000 001 level of significance, and the hypothesis is untenable. 

Any deviation that is large enough and is on a sufficiently low 
probability level to lead to a rejection of the hypothesis regarding 
the population is said to be statistically significant and the mean of 
the sample is said to be significantly different from that of the popu¬ 
lation. The question arises: at what probability level does a deviation 
become significant? There is no rational probability level at which 
possibility ceases and impossibility begins, but it is conventional 
to regard a probability of 0*05 as the critical level of significance. 
The considerations that govern the choice of this level will be dis- 
. cussed in section 3.3, and it is sufficient here to state that this 
convention has been found to give a satisfactory rule of action in 
most circumstances. It will be seen from Table 2.5 that a deviation 
of twice the standard error corresponds roughly to a probability 
level of 0*05, so we have the following working rule: 

A deviation of a sample mean from an assumed population 
value of twice the standard error lies on the 0-05 (or 5 per cent.) 
level of significance, and a deviation greater than this amount is 
statistically significant. 

As a measure of dispersion of the sampling distribution of sample 
means, the quartile deviation is sometimes used instead of the 
standard error, and is called the probable error. The probable error 
is o * 674 49 times the standard error, and three times the probable 
error is roughly equivalent to twice the standard error. The probable 
error has no particular advantages, and as it involves the trouble¬ 
some factor o • 674 49, it is rapidly going out of use. 


Significance of Difference between Two Sample Means 

3 . 21 , The most common situation is one in which the investigator 
has two samples, and wishes to know if their differences are real or 
may be attributed to errors of random sampling. Again we shall 
confine attention to the two means. The appropriate hypothesis is 
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that the two samples are from populations having the same mean, 
and the probability level of the observed difference is calculated 
accordingly. Again the hypothesis is accepted if the level is fairly 
high and there is no statistically significant difference between the 
means; if the level is low (say below 0*05), the hypothesis is rejected 
and the difference is real. To calculate these probabilities it is 
necessary to know the sampling distribution of the difference 
between two means. 

Let the total numbers of individuals in the two samples be 
and N2. Then it is possible to imagine a sampling experiment in 
which a very large number of pairs of samples are taken from the 
respective populations, the number of individuals in the first of each 
pair being Nj and the number in the second being Ng* For each 
sample the mean may be found, and hence for each pair, the 
difference between the two means. There will be as many differ¬ 
ences as there are pairs of samples, and these may be formed into 
a frequency distribution—the sampling distribution of the difference 
between two means. This distribution has been deduced mathe¬ 
matically and is normal with a mean value of zero, as may be 
expected, and a standard deviation (or standard error) larger than » 
the standard error of either sample mean taken separately. If SE^ 
is the standard error of the distribution of one series of means and 
SE2 is that of the other, the standard error of the distribution of 
differences is 

SE,.^ = V{SE,)^ + (- 54 ) 2 * . . . ( 3 . 1 ) 

provided the two samples are independent. Further, if aj and ug 
are the standard deviations of the individuals in the two popula¬ 
tions, the stmdard error of the difference between the two means is 



• ( 3 * 2 ) 


Usually, ai and (Xg do not differ appreciably, and it is reasonable 
to include in the hypothesis the postulate that the two populations 
are the same, so that = erg. 

In these circumstances, if the two single samples are of the same 
size they have the same standard error, and the “twice the standard 
error” criterion leads to the working rule that differences greater 

^ This follows easily from the equations in section 3.55. 
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than three times the standard error of a single mean are significant; 

for if SEi ~ SE2, SE^_^ = V zSE-^ and aV2 is nearly equal to 3. 

In order to compute the standard error, it is necessary to know a, 
the standard deviation of the individuals in the population. Fre¬ 
quently this is unknown, and as an approximation an estimate s 
obtained from the samples is used instead. This practice limits the 
application of the theory to large samples. Where there are two 
samples and a common standard deviation is assumed, some com¬ 
bined estimate of r should be used since the hypothesis is that the 
samples are from the same population. If and ^2 are the two 
sample estimates, a suitable combined estimate is 


r = 


' ^ 1 ^! + N,sl 

N^ + N, 


■ ■ ( 3 - 3 ) 


and substituting this value of r for the values of cr in equation (3.2) 
we have 


SE 


1-2 “ 



• • (3 *4) 


Frequently, however, the standard errors of separate means are 
calculated more or less as a routine and it is convenient to use 
these directly in equation (3.2) leading to the expression 


SE 


1-2 



• ( 3 - 5 ) 


This latter course is not consistent with the hypothesis that the 
samples are from the same population, but when the samples are 
large the two courses usually lead to practically the same results.* 
The second is used in the following example. 

The British Association Report for 1883 gives on p. 256 distri¬ 
butions of the heights of men born in England and Scotland; as an 
example we will test the significance of the difference between the 
two means. The necessary data are in Table 3.1. 

The difference in height is i • 108 i inches, and its standard error 

IS 

0-032 38^ + 0‘o68 68^ = dr 0-075 9; 


* The second course is consistent with the original hypothesis that the 
samples are from two populations with the same mean and different standard 
deviations. This is a legitimate assumption, but is not usually as acceptable 
on other grounds as the one given above. 
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I • io8 I is 14-6 times 0-075 9 > we thus conclude that Scotsmen 
are really taller than Englishmen. 

General Discussion 

3 . 3 . We shall first discuss the choice of the level of statistical 
significance. If we work consistently according to the rule that the 
hypothesis is acceptable provided the observed difference or devia¬ 
tion is below a given level and is rejected if the deviation is above, 
one of four possible situations may arise*: 

We may be in error because we: 

(1) Reject a true hypothesis, or 

(2) Accept a false one; 


TABLE 3.1 


Country 

Number in 
Sample 

Mean 

Height 

(inches) 

Standard 

Deviation 

(inches) 

1 ' 

Standard Error of Mean 

England 

6 194 

67-4375 

2*548 

2-548 4- V'6 194 
== ± 0*032 38 

Scotland 

I 304 

68•S 4 S 6 

2*480 

2*480 Vi 304 

= ± 0*068 68 


or we may be correct because we: 

(3) Accept a true hypothesis, or 

(4) Reject a false one. 

Ill accepting a hypothesis according to the rule, we only do so 
tentatively, since no hypothesis is, or ever can be, finally proved. 

In the long run of statistical experience, the ratio of wrong 
inferences under (i) to total inferences under (i) and (3) when the 
hypothesis is a true one can be made as low as we please by making 
the level of significance high|; indeed, this ratio is the probability 
level, and the rule already proposed leads to a true hypothesis being 
rejected once in every twenty experiments for which a true hypo- 

* This analysis was given by Neyman and Pearson (1928), and some 
writers have accepted the classification to the extent of referring to 
situations (i) and (2) as “errors of the first and second kinds. 

f A low probability corresponds to a high level of significance. 
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thesis is made. By adopting a level of o-oi, this kind of error is 
made only once in every hundred experiments. Unfortunately, by 
using a higher level of significance, we are more likely to accept 
the hypothesis and hence are more likely to be wrong under situa¬ 
tion (2). Thus the choice of the level for a criterion of acceptance or 
rejection is a compromise between the risks of the two types of error. 

Unfortunately, it is not possible to state the proportion of 
erroneous inferences xmder (2) to the total experiments in which a 
false hypothesis has been made. It will depend to a considerable 
extent on how near the hypothesis is to the truth as well as on the 
stringency of the test. 

The choice of the critical level of significance will depend to some 
extent on circumstances, but it is governed largely by the following 
consideration. We usually have a large body of knowledge and a 
scientific tradition to guide us, so that there is a fairly strong pre¬ 
disposition towards accepting any hypothesis we make. One of the 
strongest traditions favours simple hypotheses involving few con¬ 
stants in preference to complex ones involving many constants. For 
example, in investigating a possible difference in height between 
Englishmen and Scotsmen as in paragraph 3.21, it was in accordance 
with scientific practice to prefer the hypothesis of a single mean 
height to one involving two heights, i.e. to assume no difference 
until the contrary was proved. It is because of this, and because the 
effect of an error under (2) is less serious than one under (i) that 
the chosen criterion of significance usually has a probability level 
as low as 0-05, and in instances where there are exceptionally strong 
a priori reasons for preferring the hypothesis or an error under (i) 
may have important consequences, the appropriate probability level 
is O’01 or even lower. On the other hand, most hypotheses tend to 
be of a negative character in that they are in accordance with existing 
knowledge, and by choosing too low a probability corresponding to 
a leyel of significance that is too high, the proportion of mistaken 
inferences of the second kind may be too great, and advance of 
knowledge may be unjustifiably impeded. Too much scepticism may 
be obstructive. 

3 . 31 . It can be seen that one means of reducing the effect of 
erroneous inferences of the second kind without affecting those of 
the first is by reducing the standard error of the statistical constant 
under test. For a given level of significance, a smaller standard error 
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leads to a correspondingly smaller deviation lying just on the chosen 
level of significance, and so to a smaller deviation wrongfully dis¬ 
counted as not significant. The standard error may be reduced by 
increasing the size of the sample, and we shall see in section 3.7 
that from this point of view some statistical constants are preferable 
to others that measure equivalent properties of the frequency 
distribution* 


The statistical significance gives no information as to the magnitude 
or practical importance of any difference; that can only be judged 
by one with technical knowledge of the subject to which these 
methods are applied. A very large sample may make very small and 
unimportant differences overwhelmingly significant, while if the 
sample is small, large and important differences may be obscured by 
random errors. The verdict *‘not significant,’^ therefore, is more like 
the “not proven” of Scots law than “not guilty.” If an approximate 
value of the standard deviations of the populations is available (say 
from preliminary samples), it is always possible to estimate how 
large samples are necessary to show up a given difference; N should 
be amply big enough to make twice the standard error of the difference 
less than the specified one. 

If a sample mean is significantly different from a hypothetical 
population value or the difference between two sample means is 
significant, statistical theory can shed no light on the cause of the 
deviation. It may be either that the bulk sampled is really different 
from the hypothetical population or that the sample is not truly 
representative and random; either that there is a real difference 
between the two populations sampled or that there is some 
difference in sampling technique. 

It may be noted that there is nothing in these tests of significance 
that provides a criterion for choosing between hypotheses that may 
be compatible with the data. Such a choice must be made on non- 
statistical grounds. 

8 . 32 . So far we have taken as the 0-05 level of significance a posi¬ 
tive or negative deviation for which the two “tails” of the probability 
curve total 0*05, i.e. each “tail” separately is 0*025. This is because 
we usually have no a priori reason for deciding that the deviation 
or difference should have one sign or another. In the example of 
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section 3.2 treating of the lengths of cotton hairs, the mean of the 
sample of i 000 happened to be greater than the assumed popula¬ 
tion ; we should have tested the deviation in the same way had the 
mean been less. In the example of section 3.21 the Scotsmen were 
apparently taller, on the average, than the Englishmen; we should 
have tested the difference had the Scotsmen been shorter. In cir¬ 
cumstances like these, the probabilities must be added for both 
positive and negative deviations or differences, so as to make the 
calculation correspond to practice and to give correctly for the long- 
run of experience the ratio of mistaken to total inferences under 
situations (i) and (3) above. For differences, the argument may be 
put in another way which may be more helpful. Our hypothesis is 
that the true value is zero, and the question we ask is, “What is the 
probability that random sampling will give a difference greater than 
that observed?” Now we do not know the sign, and it is purely an 
accident whether in comparing two means, and (say), we take 
[Xg^ — Xj) or (xj — xj, so we may always make the difference positive 
(= + d, say). Then in considering the sampling curve we must also 
do the same thing and only use the positive half of it, in which 
differences can only range from zero to plus infinity. Thus the 
probability that of all the positive differences that could be obtained 
from random sampling, one would be greater than + ^ is the ratio 
of the area beyond the ordinate at + ^ to the total area of the halved 
normal curve. In Fig. 3, OE is twice the standard error, and the 
area under the curve to the right of EF is 5 per cent, of the half of 
the curve to the right of the origin, O, or 2^5 per cent, of the full 
curve. Thus, there may be a distinction between the 5 per cent. 
value or point at which an ordinate cuts off a tail of 5 per cent, of 
the total area, and the 5 per cent, level of significance. 

Sometimes, it is reasonable to regard a deviation corresponding 
to a single “tail” of 0*05 as lying on the 0-05 level of significance. 
For example, a strong believer in Scottish superiority might argue 
that Scotsmen cannot possibly be shorter than Englishmen, that 
they must either be equally tall or taller. He would only subject to 
statistical test a difference that showed Scotsmen as taller, and any 
difference of the opposite sign he woyld dismiss without test as 
being due to random errors. It is consistent with such an attitude 
to regard a difference of i • 65 times the standard error, corresponding 
to a single “tail” of 0-05 as lying on the 5 per cent, level of signifi¬ 
cance. It is only legitimate to do this, however, if the grounds for 
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deciding the sign of the difference or deviation (if real) are a priori 
and independent of the result given by the sample. 


Groups of Samples 

3 . 33 . The probabilities deduced in accordance with the foregoing 
theory are only for single pairs of samples taken at random, and when 
there are several samples, the problem of significance is more com¬ 
plicated. If we had a hundred differences whose true value was zero, 
we should expect four or five of them to be greater than twice the 
standard error, but if we applied the simple theory by ‘‘rule-of- 
thumb,’' we should erroneously report them as real. Similar con- 


Largest 


Deviation 
Standard Error 


TABLE 3.Z 

OF n Lying on 0*05 Level of Significance 


n 

Deviation 

Standard Error 

I 

2*0 

% 

2‘2 

Ar 

2-5 

6 

2*6 

10 

2*8 


siderations must be applied to groups of less than one hundred tests 
and we will work out the significance of the biggest of n deviations. 

Suppose there are n independent differences between pairs of 
means; e.g. the aim may be to see if several treatments have an 
effect on some quantity and one of each pair of samples may be an 
untreated control (a separate one for each pair) and the other be 
given one of the treatments. We wish to test if the biggest ratio 
difference!standard error of difference is significant. Let the proba¬ 
bility that errors of random sampling would give a single difference 
equal to or greater than d (say) be P (as found in the ordinary way 
from probability tables), and let P^ be the probability that the 
biggest of n would equal or exceed d. Then (i — P„) is the proba¬ 
bility that n differences would be less than d^ and (i — P) that one 
random difference would be less; from Rule II (section 2.11) we have 

(I - P,) = (I - P)-, 
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whence Pn = i — (i — P)”. 

Table 3.2 has been worked out to show what value the largest of 
n deviations (in terms of the standard error) must reach to lie on 
the o * 05 level of significance for the normal distribution. 

Sometimes we want to know if the difference between the largest 
and smallest in a group of sample means is significant; such a 


TABLE 3.3 


Range 


^ - = -=r- ON 0‘0S AND 0*01 LeVELS OF 

Standard Error of a Difference 

Significance 


Number of Samples P = o*o5 P = o-oi 


2 2*0 Z‘6 

4 2*6 3*1 

6 2*9 3-4 

10 3*2 3'6 


difference is a range. E. S. Pearson (1932) gives some values of 
the range at various levels of significance and Table 3.3 has been 
calculated from them. When there are four samples, a difference of 
2*5 times is equivalent to one of twice the standard error for a pair. 

We shall illustrate this by the data of Table 3.4, which have been 
obtained from Table 10.6 showing the mean com yield per plot 
for a number of agricultural plots subjected to five treatments. 


TABLE 3.4 


Treatment 

Mean Yield per Plot, Grammes 

A 

29s *2 

B 

397-5 

C 

276*3 

D 

272*2 

E 

271 *8 


I 

We are given that the standard error of any one mean is 8*712, 
so that the standard error of any random difference between two 
means is 12*32 grammes.** The largest difference is between treat- 

* The treatments are “dummies.” The standard error is deduced from 
Table 10.71, 
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ments B and E, and is 25*7 or 2*1 times the standard error. For a 
single randomly chosen pair of means, this difference would be 
significant, but here it is the largest difference in a set of five means, 
and from Table 3.3 we deduce that it is well below the 5 per cent, 
level of significance. 

If it is desired to compare the means of several samples as a whole, 
there are other and more suitable methods that will be described 
in Chapter VI; but if the wish is to compare selected pairs, the 
above considerations show at least that more stringent tests of 
significance should be applied than when a single pair is taken at 
random. 

Applicability of Normal Sampling Theory 

3 . 4 . The correspondence of a deviation of twice the standard error 
to the 0-05 level of significance depends on the sampling distribu¬ 
tion of the mean being normal. Strictly, this only happens when 
the individuals in the sampled population are normal, but for large 
samples taken from most non-normal populations that are likely to 
be met with in practice, the distribution of the mean is nearly 
normal, and the normal sampling theory is sufficiently close an 
approximation for practical purposes. This frequently is a second 
reason for applying the methods of this chapter to large samples only. 

The population and a sample may be characterised by constants 
other than the mean, and in fact any of the constants mentioned 
in Chapter I may be used. These all have their sampling distribu¬ 
tions and, if they are known, deviations corresponding to various 
levels of significance may be calculated. For large samples, however, 
most of these distributions tend to be nearly normal, and as an 
approximation it is usual to apply the normal theory by calculating * 
the standard error of the constant, and regarding deviations of twice 
this standard error as lying on the 5 per cent, level of significance. 
In this chapter, further applications of sampling theory will usually 
follow this procedure, but it is desirable for the sake of subsequent 
applications to discuss the use of skew sampling distributions. This 
discussion follows in the next section. 

Tests of Significance Based on Skew Sampling Distributions 

_ v 

3 . 41 . For a symmetrical distribution we decided in section 3.32 
to regard the 2*5 per cent, points (on the positive and negative 
sides of the population value) as lying on the 5 per cent, level of 
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significance, and since the two deviations are equal, only one need 
be specified. When the distribution is not symmetrical, some modi¬ 
fication is obviously necessary, and there are three possibilities. 

(i) We may retain the equality of the positive and negative devia¬ 
tions, and regard as lying on the 5 per cent, level of significance a 
value such that the areas of the two tails beyond ordinates at these 
points together add up to 5 per cent, of the total area. In Fig. 6 
(which is not drawn to any scale, and is only diagrammatic), O is 
the population value, and OA = OA' is the significant deviation, 


M 



the areas beyond the ordinates at A and A' being 5 per cent, of the 
total. 

Such a criterion is to be condemned, for it gives positive and 
negative deviations equal importance. 

(2) Extending the argument already applied to differences in 
section 3.32, we may use separate tests for positive and negative 
deviations from the population value. If the deviation under test is 
positive, we use only that part of the sampling distribution on the 
positive side of the ordinate drawn at the population value, and regard 
as lying on the 5 per cent, level the deviation at which an ordinate 
cuts off a tail of 5 per cent, of the positive part of the curve. Similarly, 
if the deviation is negative we use only the negative part of the 
sampling distribution. In Fig. 6, OB and OB' are the two deviations, 
and the area to the left of the ordinate at £ is o • 05 of that to the 
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left of OM, while the area to the right of the ordinate at B is 0*05 
of that to the right of OM. Because OM does not divide the total 
area into equal parts, the two tails are not equal and therefore are 
not 0*025 of the total area. 

(3) We may regard as lying on the 5 per cent, level of significance 
the deviations beyond which the two tails are each equal to 2*5 per 
cent, of the total area under the curve. In Fig. 6, C and C' are about 
on these points. The shape of any frequency curve may be altered 
by plotting the abscissse in terms of some function of ^ (e.g. putting 
x' == log x), and it is theoretically possible to choose the function 
so that the curve reduces to the normal form. Such a procedure 
alters the relative positions of the mean and mode, but not of devia¬ 
tions defined by the ratios in which they divide up the area. Con¬ 
sequently 2*5 per cent, values of the transformed variate .jc' still 
remain 2-5 per cent, values of the untransformed variate x in the 
skew distribution. If the normal curve is regarded as the fundamental 
curve of errors, the deviations lying on the 5 per cent, level of 
significance according to this criterion would satisfy the normal 
criterion if the curve Were transformed. 

At the moment, both the second and third uses of skew curves 
for testing the significance of differences appear to be consistent 
extensions of the criteria developed for symmetrical curves, and it 
is difficult to decide between them. Fortunately, the difference in 
results obtained by the two methods is of no practical importance, 
and the last one will probably be most convenient in practice. 

Sampling Errors of Various Constants 
Means of Binomial and Poisson Distributions 

3 . 51 . When developing the binomial distribution in, Chapter II 
we regarded the sets of trials as individuals and the number of 
occurrences per set as the variate. If, however, we regard the trials 
as individuals for which the variate can only take one of two charac¬ 
teristics, viz. success or failure, the set of n trials becomes a sample 
of n and the binomial distribution becomes a sampling distribution 
of the number of sucesses per sample. From this point of view, the 
small n of the binomial is equivalent to the large N of this chapter. 
As stated in section 2.5, when the number in the sample is large, 
the binomial distribution approaches the normal and the. normal 
sampling theory may be applied. 


i 
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For the binomial the standard error of the nximber of successes 
(/ = np) is the standard deviation given in equation (2.2), p. 48, 
and is 

on dividing thus by n we obtain, 


Standard Error of p = J:: 


'p — 

■I— I ■ • 

n 


Darbishire (1904), on crossing waltzing with normal mice, found 
in the Fg generation 458 normals and 97 waltzers (total 555). The 


TABLE 3.5 

Diet 

Males Females 

Total Young Percentage Males 

Vitamin B Deficient 
Vitamin B Sufficient 

123 153 

145 150 

276 

29s 

44-57 

49-15 

Totals 

268 303 

571 

— 


Mendelian expectation for / is 416 normals with a standard error of 


I / A 416^ , 

± . / 416-= ± 10-2. 

V 555 

The difference between the actual and expected number of normal 
mice is 42, and being 4-1 times the standard error is significant; it 
could only arise from random sampling four times in 100 000 trials. 

Parkes and Drummond (1925) give the data of Table 3.5, showing 
the effect of vitamin B deficiency on the sex-ratio of the offspring 
of rats. 

In comparing the sex-ratios, we are not comparing an experimental 
ratio with a theoretical one, but two experimental ones. Hence, we 
do not know the values of p in the infinite population required for the 
calculation of the standard errors; we will use the sample values 3,s 
an approximation. The standard errors of p are 


zb 


(0-445 7 — 0-198 6 
276 


± o•029 9 




1 
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0-491 5 — 0-241 6 
29s 


= dz 0-029 I) 


and that of the difference is dz 0-041 7, or dh 4*17 per cent. The 
difference in sex-ratio is 4-58, and being only i -10 times its standard 
error is insignificant; it would arise from random errors nearly 
27 times in 100 trials. 


3 . 52 . When the mean number of failures {m) per set is large, 
the Poisson distribution also approaches the normal form. Then 

the standard error per set is m, the standard error of the mean 

of N sets is Vm/W, and that of the total number of failures in 
N sets, mN is 


N 



Thus for the distribution of yeast cells of Table 2.4, the total number 
of cells counted is i 872, with a standard error 

±vT^ = ±43-3; 


the mean number of cells per square is 4- 68, and its standard error is 



4-68 


di o-io8. 


Measures of Dispersion 

_ (t' 

3 . 53 . The standard deviation estimated from the variance has a 
Handard error of 


j VzN 

wl^ere a is - the standard deviation of the population and N is the 
sizp of the sample. Its distribution is not normal, but for large samples 
it lapproaches normality; e.g. at iV = 100, = 0-005 ^ 

= g-oooo. Let us see if the Scotsmen of Table 3.1 are really 
mare regular in height, as well as being taller on the average, than 
thei Englishmen. The standard error of a standard deviation is 
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1/V2 times that of a mean, so for the difference of standard devia¬ 
tions of the table it is 


- 4 = X 0*075 9 = ± 0*053 7 - 

V 2 


The difference (2*548 — 2*480 = 0*068) being only 1*27 times 
its standard error is not significant; in fact, random sampling would 
give as big a difference about one trial in five (P = 0*2). 

Only when found from the second moment has the standard 
deviation the above standard error. We shall now deal with its 
standard error when estimated from the range. E. S. Pearson (1926 
and 1932) has worked out the sampling distribution of the range 
fairly fully, and although for no size of sub-sample is it normal, it 
is only moderately skew for those of about 10; for sub-samples 
of 10, j8i= + 0*156 and == 3-22. In such circumstances, we 
noay use the standard error as an approximate description of the 
sampling errors. Pearson gives the standard error or deviation of 
the range for several sub-samples, and for those of 10 it is 0-797 
where a is the true standard deviation. Suppose the sample of N 
contains m sub-samples of 10, so that N — 10 m. Then from sec¬ 
tion 2*71, 


Standard Error of Mean Range = 


0-797 a 



We also have from Table 1.2, 


whence 


Estimated Standard Deviation == 


Mean Range 
3*078 


Standard Error of Estimated Standard Deviation 

Standard Error of Mean Range o • 797 a _ i • 158 a 

3-078 3 ‘^ 7 ^V^ 


This may be compared with the standard error of the estimate 
obtained from the variance. 

In large samples, the standard error of the mean deviation is 

g 

I • 068 —= O • 861 

VzN 
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where S is the population value of the mean deviation. It follows 
that the standard error of the standard deviation estimated from the 
mean deviation is 


1-253 X 0‘86i 


or 




I -068 


cr 


■s/zN 


Constants of Shape 

3 . 54 . The j8 ratios are useful for testing the departure of any data 
from normality, but their distributions in samples drawn from an 
infinite population of that form have not yet been worked out. The 
first four moments of the sampling distributions have been derived 
by Fisher (1930^), and from approximate values E. S. Pearson (1930) 
has determined appropriate empirical frequency curves of K. Pear¬ 
son’s system. Curves found in such a way usually give good approxi¬ 
mations to the actual distributions. 

To test asymmetry, — (with the same sign as /xg) is used. 
For samples of N from the normal population, this has a standard 

error of V 6 JN (to a first approximation), and the distribution itself 
is so nearly normal that a deviation of twice the standard error lies 
practically on the 0-05 level of significance. 

The standard error of is a/ 24/72 (to a first approximation), 
but the distribution is so skew, and this approximation is so poor, 
that it does not give a reliable test. Pearson’s tables {loc. cit.) give 
for samples from the normal population, values of ^2 
side of the population value (3 • o) which cut off tails of 5 and i per 
cent, of the whole curve; these tables should be used. From the 
discussion in section 3.41, it is suggested that the 5 per cent, level 
of significance should be taken as lying between the 5 and i per 
cent, values. 

For the distribution of heights of Table 1.5, we have 

> yi = —- o-ii6 ± 0*074 6, i32 = ^‘9o8 and 72=1078. 

y 3, is less than twice its standard error, and so is insignificant, \yhile 
firom Pearson’s table, when n = x 000, the lower value of ^2 with a 
“ tail” of o • 05 is 2 * 76, and since the value above is nearer 3 than that, 
v^e may conclude that the data are, as far as we can tell, from a normal 
population. * 
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Standard Errors of Functions of Statistical Constants 

3 . 55 . Sometimes it is desired to calculate the standard errors of 
functions of one or more statistical constants, the errors of the 
individual constants being known. We have already had an example 
in the difference between two means; the ratio and product of two 
means, and the square of the standard deviation are other examples. 
We shall describe here an approximate method that is applicable 
to large samples. Where there are more than one constant we shall 
assume they are obtained from independent samples except in one- 
instance. 

^ Let the population values of the constants be a and j8, let the 
function be 

A=/(a,^) 

and let the corresponding sample values for any one pair of samples 
be Z, a and where 

/ == A + 8A, a = a + Soc and & = j8 + S^, 

S is used as a sign meaning a deviation. Then for the pair of samples 

I =f(^cx. + 8a, jS + SjS). 

It can be shown that, for the purposes of deducing standard errors 
and variances, the relations between 8A, 8a and Sj8 may be approxi¬ 
mately derived by regarding them as mathematical differentials, so 
that 

SA = ^Sa + ^Sj8 

dec dp 

The degree of approximation involves neglecting quantities of the 
order i/N compared with unity, where N is the number in the 
sample. 

The mean values of the squares of SA, Sa and 8j8 for all possible 
pairs of samples from the population are the squares of the corre¬ 
sponding standard errors and may be written (SEiY, (SE^^ and 
(SEf,)^. Hfence, by squaring the terms of the above equation and 
finding the means for all pairs of samples, 

w=Ml"- 

where [8 a 8^8] is the mean value of the product of the deviations 
Sa and Sj 3 for all pairs of samples. If the two samples in the pair 
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are independent, this mean product is zero, as will be shown in 
Chapter VII. Hence, for independent pairs of samples, 

= + . . (3.6) 

This equation may easily be extended to three or more statistical 
constants. 

When the function is the difference between two means, equa¬ 
tion (3.1), section 3.21, follows from (3.6) directly. 

By way of example, let us deduce the standard error of the 
variance v of a sample in terms of that of the standard deviation s, 
so that I == Vi a = s and there is no Let the population value of 
the standard deviation be a; then 

{SE^f = ^a%SE,y = 

As another example, let / be the ratio between two means x and y, 
with corresponding population values of A, | and fj. Then 


I ^ 

y 


(SEif = 


2 (SEyf 


_|_ 


7 ]^ 


} 


The standard errors of x andy have already been given in section 2.71. 

It has been shown in section 2,72 that the mean and standard 
deviation estimated from the same sample are independent, and it 
follows from this that equation (3.6) may be used to determine the 
standard error of the coefEcient of variation. If 100 k, ^ and a are 
respectively the population values of the coefiBcient of variation, 
mean and standard deviation, corresponding to A, a and jS, and 
k is the sample estimate of /c, it is easy to see from equation (3.6) 
and the standard errors given in sections 2.71 and 3.53 that the 
standard error of the coefficient of variation is 


100 K 



2K^ + I 


zN 


After reading Chapter VII, readers will be able to evaluate the product 
teiia and so deal with constants that are not independent. 
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Determination of Population Value from Sample 

3 . 6 . We have seen how a statistical constant determined from a 
sample may be used to test some hypothesis concerning the popu¬ 
lation value. Sometimes, however, we are unable to postulate the 
population value, of which we look to the sample to teach us some¬ 
thing. This process of estimating the population value from the 
sample or of arguing from the particular to the general is the 



O TTg TT^ '^1 I 

Population Value 
Fig. 7. 

inductive method, and the question arises as to the accuracy of the 
estimate given by the sample. This may be answered if the sampling 
distribution of the constant is known, and we shall now illustrate 
the kind of answer provided by referring to samples of 100 indi¬ 
viduals taken to measure the proportion having a given character, 
i.e. the proportion of successes. 

The population value of this proportion may be denoted by tt* 
and any sample value by p. Then a diagram like that of Fig. 7 may 
be drawn (this is not drawn to scale), where for any population 

* Here tt is not 3 • 14159 . . - 
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value 'jTi the point B for which p — represents the mean of all 
the sample values, and the points A and C represent values of p 
lying on the 5 per cent, level of significance. These points may be 
found from the sampling distribution of p^ which is given by the 
terms of the binomial + i — and if we may assume this 

to be approximately normal, 

AB = BC == Twice the Standard Error = zy 7 r-^{i — 771) “ 100. 

For example, if tti = o-8, B is at = 0*8, and the standard error 

is Vo*8 X 0*2 — 100 — 0*04; hence A is at^ = o-8 + o-o8 
= 0*88 and C is at j!) = 0*8 — o-o8 == 0*72. In this way, points 
corresponding to A and C may be found for all possible values of tt, 
and these lie on the curved lines shown diagrammatically in Fig. 7. 
The points corresponding to B lie on the line p ^ tt. Then we 
know that for any one value of tt, 95 per cent, of the sample values 
lie within the limits represented by the points on the curved lines 
at which an ordinate drawn at tt cuts them, and so by adding the 
experience for all populations, we know that 95 per cent, of the 
sample values lie within the limits of the curved lines, whatever may 
be the relative frequencies with which the different values of tt occur. 

If in making inferences from samples it is assumed that all 
sample-population points lie within the limits, i.e. that for every 
sample value p^ (say) the population value lies between 773 and tt^, 
we shall be right in 95 per cent, of the inferences. In this way, it 
is possible to make an estimate from a sample of the range within 
which the population value lies, with a given long-run risk of being 
wrong. 

Fisher (19306) has termed the limits represented by the curved 
lines the fiducial limits and the corresponding proportion of correct 
inferences (95 per cent, in Fig. 7) the fiducial probability, Neyman 
(1934) uses the terms confidence limits and confidence coefficient. It is 
possible, of course, to determine limits corresponding to other 
fiducial probabilities, e.g. for the 99 per cent, level, and for any 
. statistical constant that has a known continuous sampling distribu¬ 
tion. In order to determine the sampling distribution of the 
constant, it is usually necessary to assume the form of the popula¬ 
tion, e.g. normal, and it may be necessary to assume the standard 
deviation. When the distributioi\ is skew, difficulties of the kind 
mentioned in section 3.41 have to be faced,. 

In deriving the limits in Fig. 7 by the methods outlined, the 
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result would be limits of p for known values of tt; for practical 
purposes it is more convenient to have limits of tt for known equi¬ 
distant values of It is not proposed here to go into the method 
of arriving at this desired form of expression (simple inverse inter¬ 
polation is effective), but it is pointed out that the limits of tt 

corresponding to a sample value p are not p d: z's/p{i — p) ^ 100. 
For example, when p^ == 0*2, = 0*291 and TTg =,0*132. It will 

be noted that the limits of tt are not even equidistant ivomp. These 
conditions arise when the standard error of a statistical constant 
depends on the population value of the constant as for the naean 
of a binomial or Poisson series. When this is not so, and the standard 
error is independent of the constant, the 95 per cent, fiducial limits 
of the population value are two parallel straight lines, and for any¬ 
one sample value are at twice the standard error above and below 
that value, assuming a normal sampling distribution. Thus the 
95 per cent, limits of the mean height of Englishmen in Table 3.1 
are 67-437 5 ib 2 X 0*032 38, i.e. 67*502 3 and 67-3727 inches. 

Fiducial probability is not the same as inverse probability, although 
is appears superficially to be so. An inverse probability statement 
would be applied to any particular instance and would be of the 
form: “given a sample value Ao> ^te probability of the population 
value lying between and -773 is 0*95” (see Fig. 7). This involves 
the conception of a super-population of population values corre¬ 
sponding to the one sample value p^, 95 per cent, of which are 
between 773 and 773. Such a conception lies behind any statistical 
interpretation of inverse probability and presents many difSculties, 
one of the chief of which is that there is no basis for making any 
assumption regarding the distribution in the super-population. The 
fiducial probability statement says that for all possible sample values 
taken from any populations, 95 per cent, of the population values 
of 77 lie between the limits shown in Fig. 7; no assumption is made 
as to the distribution of the population values. 

Fiducial limits can only be calculated in the way shown in so far 
as the sampling distribution is continuous or may be approximately 
represented by a continuous distribution. 

Choice of Statistical Constants 

8 . 7 . We have seen in Chapter I that there may be several statistical 
constants for measuring one characteristic of a sample or popula- 

it 


i 
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tion; for example, the standard deviation, mean deviation, quartile 
deviation and mean range are all measures of dispersion. The 
question arises: Are there any grounds on which one constant is 
to be preferred to another? Fisher’s theory of estimation (1922, 
igz^a and 1936) provides an answer, and we shall give as much of 
it as seems necessary to show the kind of answer provided. 

So far, we have regarded statistical constants as convenient 
measures of various characteristics of samples and populations, and 
have assumed some mathematical form of distribution for the popu¬ 
lation more or less incidentally. In the theory of estimation it is 
necessary to assume or specify some mathematical form of distribu¬ 
tion in the population as a starting-point. The equation of this 
form has one or more constants or parameters that may vary from 
one population of that form to another, and the particular values 
of which define any given population. For example, if the assumed 
form of the population is the normal curve described by equa¬ 
tion (2.4), the two parameters are m and a, and given values of 
these define a particular normal population. On this view, m and a 
are not thought of as the mean and standard deviation, i.e. as 
measures of position and dispersion; they are thought of as mathe¬ 
matical parameters. Similarly, statistical constants (Fisher calls them 
statistics) calculated from samples are not descriptive averages; they 
are estimates of the parameters in the equation, and it is on this 
basis that the relative values of equivalent statistics are judged. 
Here, we need concern ourselves with two only of the criteria given 
by Fisher; they are consistency and efficiency. 

Consistency .—^A consistent estimate of a parameter must equal the 
parameter when it is derived from the whole distribution in the 
population. In a normal population, all the above measures are 
consistent estimates of a. This criterion is an obvious one and is 
satisfied by all statistic^ estimates in common use. 

Efficiency .—This criterion applies to those statistical estimates 
from large samples for which the sampling distribution is known 
and is normal, so that it is completely described by the mean and 
standard error.* The mean of the estimate in the sampling distri¬ 
bution may differ from the parameter, but this difference may be 
calculated and allowed for as bias. The standard error defines the 

* As an approximation, the ideas and criteria may be applied where the 
sampling distribution is nearly normal, i.e. to all constants the standard 
errors of which are given in this chapter, except ^2^ 
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inaccuracy of the estimate; here it will be more convenient to deal 
with the square of the standard error, viz. with the variance of the 
estimate. 

For statistics of the class with which we are now dealing the 
variance in samples of N may be expressed in the form 

I 

Tn 

where I depends on the statistic and is called its intrinsic accuracy. 
For example, the intrinsic accuracy of the mean as an estimate of m 
in the normal population is ifa^ and that of the square root of the 
second moment as an estimate of a is 

We have seen in section 3.53 that the various estimates of the 
standard deviation have differing standard errors, and hence intrinsic 
accuracies. Moreover, an increase in I has the same influence on 
the sampling variance as a proportionate increase in iV, so that 
differences between intrinsic accuracies may be expressed in terms 
of N. Thus, for a estimated from the second moment, I == z/cr^ 
and for the estimate obtained from the mean range in sub-samples 
of 10, I = The variance of the estimate from the range 

in a sample of 100 is equal to that of the estimate from the second 
moment in a sample of 100/1-34 = 75, for 

2.x TS ^ 

The loss in accuracy due to using the range instead of the second 
moment is equivalent to a 25 per cent, reduction in the size of the 
sample. We may say that the estimate from the range has an efficiency 
of 75 per cent, of the estimate from the second moment. 

For each parameter, there is a class of |tatistics that have the 
same intrinsic accuracy, that is greater than the intrinsic accuracy 
of all other estimates, and these are called efficient statistics. The 
ratio of the intrinsic accuracy of any given statistic to that of an 
efficient one is called the efficiency of the given statistic. The mean 
and standard deviation as defined in sections 1.22 and 1,23 are 
efficient estimates of m and cr in the normal population. 

When using one estimate of a parameter, we may regard the 
sample as giving a certain quantity of information regarding the 
parameter, that increases in proportion to the size of the sample. 
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Each individual may be regarded as contributing a unit amount of 
information, and the connection between the intrinsic accuracy and 
size of sample suggests that the intrinsic accuracy should be the 
unit. Fisher in his later writings (1935) has referred to intrinsic 
accuracy as the quantity of information given hy a single observation. 
In these terms, the mean range in sub-samples of 10 may be said 
to give three-quarters of the amount of information given by the 
second moment regarding the parameter a. Just as for a given 
estimate the number of individuals is additive in the sense that two 
independent samples of and give the same amount of infor¬ 
mation as one sample of N-^ -)- iVg, so for different estimates and 
samples, independent quantities of information are additive. 

When combining estimates of a parameter that have normal 
sampling distributions, the estimates being from different samples, 
we may take a weighted mean, using the quantities of information 
as weights. This is analogous to combining estimates of the mean, 
say, by using the numbers in the samples as weights. The variance 
of the combined estimate is the inverse of the quantity of informa¬ 
tion, This result is used in sections 8.22 and 11.73. 

There are several reasons why efficient statistics should be used 
except in special circumstances. The use of an inefficient statistic 
is equivalent to throwing away part of the data, and such a course 
is usually only justifiable if there is a proportionate saving in labour, 
say in computation. When efficient statistics are used, mistaken 
inferences of the second kind mentioned in section 3.3 are reduced 
to a minimum. Further, populations are estimated with maximum 
precision by the use of efficient statistics, for these have fiducial 
limits corresponding to a given fiducial probability, that are closer 
than for any other estimates of the same parameter. The methods 
of the next chapter apply only when efficient estimates are used. 

A distinction must here be made between different statistical 
constants that are merely mathematical transformations of one 
constant, and constants that are essentially different estimates and 
are only related statistically. For example, the variance and standard 
deviation are mathematically equivalent; samples that have the 
same variance must necessarily have the same standard deviation 
and the two constants are equally efficient. On thb other hand, the 
standard deviation, mean deviation, and mean range in sets of 10 
are only statistically related in that the relations given in section 1.23 
are true only on the average and do not necessarily apply to any 
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’ particular sample; diiferent samples having the same standard 
deviation may have different mean deviations and mean ranges. 
These estimates do not have the same efficiency. 


Method of Maximum Likelihood 

3 . 8 . This is a general method for arriving at estimates of parameters 
of distributions, and Fisher has shown that estimates given by this 
method are efficient, provided efficient statistics of the parameter 
exist. 

We have already seen in section 2.72 how the probability of a 
given sample may be deduced when the population and its para¬ 
meters are known. When the form of population is known the 
expression for that probability (without the differential terms) for 
any assumed values of the parameters is termed the likelihood of 
the assumed values. The particular values that make this likelihood 
a maximum are the maximum likelihood estimates. They are obtained 
by differentiating the logarithm of the likelihood with respect to 
the parameters, equating to zero and solving the equations. 

For the normal distribution with assumed parameters = x and 
cr = 5 * and a sample of JV, the logarithm of the likelihood (to the 
base e) is: 



— log 27r — Nlogs 
2 


i^ {xi-- xy , 

2I 





If this is differentiated with respect to x and the differential equated 
to zero, it is easily deduced that 


A “F- ^2 H" • • * 

^ ?—■ I —... II — ■ N..I I • 

N 

I 

jjj 

Thus, the mean of the sample, as defined in section 1.22, is the 
maximum likelihood estimate of Similarly, by differentiating 
L in (3.7) with respect to s and equating to zero, the maximum 
likelihood estimate of c is found to be the standard deviation as 
defined in section 1.23. 

* The Latin letters denote estimates from the sample and the sign A 
denotes the particular kind of estimate dealt with in this section. 
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3 . 81 . When the data are in the form of a frequency distribution' 
with frequencies oi ... ... m the groups, the loga¬ 

rithm of the likelihood is 


L = 5 rajlogMj . . . . . . (3.8) 


where S is the summation over all groups and is the estimated 

s 


frequency in the sth group, determined from the equation for the 
assumed population and expressed in terms of the unknown para¬ 
meters. 

For example, for the Poisson distribution the proportion of fre¬ 
quency with s successes is 



si 


e 






If m is the maximum likelihood estimate of jiy and the total number 
in the sample is iV, 




By equating to zero the first differential of this with respect to rh 
we have 

Ssn, Ssn^ 



Thus, the efficient estimate of ju- is the mean of the distribution. 
This result is not without interest, since we have seen in section 2.3 
that the second moment of the distribution is also a possible esti¬ 
mate of fly and hitherto no rational grounds have been advanced 
for regarding it as an inferior estimate. 

The method of maximum likelihood may be used whenever a 
distribution can be expressed in terms of parameters that are required 
to be estimated from the sample. Fisher (1936a) has used it in 
genetical studies for estimating linkages. 


3 . 82 . When an efficient statistic has a normal sampling distribution, 
the square of the standard error or variance in a large sample may 
be obtained from the second differential of the logarithm of the 
likelihood L. Generally, if k is the parameter, k the maximum 
likelihood estimate, and a% is the standard error of k. 



* 


• ( 3 - 9 ) 
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where the square brackets [] signify “the mean for all possible 
sample values of the term contained therein.’’ 

For the normal distribution, on differentiating equation (3.7) 
twice, we find 

_ N 


and the mean value of for all possible samples is the population 
value cr^, whence 



This is an alternative derivation of the square of the standard error 
of the mean. 
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4 * 1 . So far we have been using as expressions of differences between 
distributions, constants like the mean, standard deviation, jS^ and jSg, 
which summarise their chief properties or are estimates of their 
parameters. Except for populations of particular forms, these do not 
express all the features of the distributions. It is desirable to have 
some index which measures the degrees of difference between the 
actual frequencies in the groups, and so compares all the essential 
features. Such is K. Pearson's (1900) which we shall first use to 
measure the deviations of an experimental distribution from the form 
of some hypothetical population. Both must be grouped in the same 
way, and the theoretical distribution must be adjusted to give the 
same total frequency; then if is the number of observations in any 
one group in the theoretical distribution, and is the corresponding 
number in the experimental one, 


X 


2 ^ ^(^5 ^s) 

S V, 


2 


(4-i) 


where S is the summation over all groups. It will be appreciated 

that since is squared, all differences in frequency, whether 

positive or negative, add a positive amount to and further that the 
greater these differences are, the greater is ^; if the two distributions 
are exactly alike, is zero. 


Sampling Distribution of 

4 . 2 . In using to test whether one distribution differs from the 
other, we must remember that because of random errors, will 
never (or hardly ever) be zero, and we need to know its sampling 
distribution so that we can tell the probability of an observed 
being given by a random sample from the hypothetical population. 
As usual, if that probability (which is symbolised by F) is low enough, 
the is said to be significant, and it is unreasonable to suppose that 
such a significant value could be a result of sampling errors alone. 
Subject io certain restrictions, the distribution of x® is given by the 
equation, 

# = .... ( 4 ^) 


i 
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where df is the element of frequency between ordinates drawn at 
X and X + frequency between x = ^ x = Xi being 

Ji>, ^ is a consent and is 4e nnmber of ^freedom. 

-^0 

This conception of degrees of freedom is not altogether easy to 
attain, and we cannot attempt a full justification of it here; but we 
shall show its reasonableness and shall illustrate it, hoping that as 
a result of familiarity with its use the reader will appreciate it. Here 
the number of degrees of freedom is the number of groups, modified. 
Clearly, since the error in each group in the experimental distribution 
contributes a positive amount to x^, the greater the number of groups 
the larger would x^ be expected to be, as a result of random variations 
alone, and accoimt of this is taken through the-quantity g. Further, 
it is often the practice to fit the theoretical distribution to the obser¬ 
vation^ by calculating constants from the sample, just as in the 
example of section 2.51 we fitted a normal curve by ma k ing' its 
mean, standard deviation and total equal to those of the sample. If 
we wish to test the adequacy of the theoretical form, further account 
must be taken of the degree to which we have made it fit the obser¬ 
vations by this method. Suppose, in an extreme case, there were 
g' groups and we fitted a curve involving g' constants which were 
calculated from the data; then the two distributions would agree 
' exactly and x^ would be zero because sampling errors would have 
had no play. To take account of this second factor, we must subtract 
from the number of groups (say g') the number of constants that 
have been determined from the data in fitting,* in order to obtain 
the degrees of freedom (g). Every constant so determined has the 
effect, from the point of view of x^, of reducing the number of groups 
by one, and the number of degrees of freedom may be regarded 
effectively as the number of independent groups remaining to con¬ 
tribute to x^* When only the totals have been made equal, g = g' — i ^ 
but in a case like the fitting of a binomial to the number of germinating 
seeds in Table 2.3, the theoretical distribution has been adjusted to 
make its mean and total both equal to the sample, and ^ — 2. 

One restriction under which equation (4.2) for the sampling dis¬ 
tribution of x^ applies is that no group contains very few indi¬ 
viduals; 10 is about the lower limit. To arrive at the distribution 

♦ Provided these constants are determined by processes which satisfy die 
criterion of efficiency. 
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of ^ experimentally, we should have to take many samples. Now the 
probability of any random individual belonging to the ^th group 
(say) is vJiV, where N is the total number in the sample, and conse¬ 
quently in all the samples, the actual numbers falling into the sth 
group {fis) will vary according to the binomial 



N 


(this is a direct application of the binomial theory given in section 
2.a). If Vs is too small, this binomial approximates to the normal 
distribution, and it is on the assumption that tis is distributed 
normally that equation (4.2) is obtained. This is the reason why in 
using ito group should contain much fewer than ten individuals, 
and to satisfy this in practice the tail groups, which often have very 
low frequencies, are combined. 

A second restriction is the usual one that all the individuals in 
the sample must be independent. Indeed, x^ be used as a test 
of independence. 

In its sampling distribution, x^ ^^Y vary between zero and plus 
infinity, and since there is no question of negative deviations, and 
we are asking if the observed x^ really greater than zero, the 
value at which an ordinate cuts off a tail of 5 per cent, also lies on the 
5 per cent, level of significance. Since, in general practice, we should 
not test a very small value of x^ tbe significance of its deviation 
from the average, we do not include the tail near = o in the test 
for significance; this is in accordance with the principles discussed 
in section 3.32. 

There are two sets of tables of the probability integral of equation 
(4.2). The first, calculated by Elderton, is included in Pearson’s 
Tables for Statisticians and Biometricians ^ and gives the areas of 
the tails beyond integral values of x^ for different values of n', where 
n' is one more than the number of degrees of freedom (= ^ + i). 
This was originally calculated for the case where the theoretical 
distribution is only made to agree with the data in respect of its total, 
and was wrongly applied to other cases for some tirhe before the 
necessity for correcting to degrees of freedom (which was pointed out 
by Fisher in 19226) was realised. The other tables by Fisher (1936a) 
give values of x^ on different levels of significance, and in these 
the degrees of freedom are used directly and are called n. 

The x^ test is more universal in its application than most others 
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in that there is no assumption made as to the normality of the 
distributions being compared. 

First let us test the fit of the normal curve to the distribution 
of heights of men in Table 1.5; the arithmetical operations are set 
out in Table 4.1, and it will be noticed that the tail groups have 
been lumped together, giving 14 groups. Three constants have been 
fitted (total, mean and standard deviation), leaving ii degrees of 

TABLE 4.1 


Frequencies 


{^s ^s) 

(«s - 

Stature in Inches 

K — Vs) 

Observed Expected 

/ \ / \ 

^5 



(Ws) {Vs) 


below 61 *5 

14*5 

II -8 

2-7 

0*229 

0*62 

61 *5” 

17 

17-7 

— 0-7 

— 0*040 

0-03 

62-5- 

33-5 

35*5 

~ 2*0 

— 0-056 

O-II 

63-5- 

61-5 

62*8 

- 1*3 

— 0-021 

0*03 

64-5- 

95*5 

96*7 

— 1-2 

— 0-012 

o-oi 

65-5- 

142 

I 30 'i 

ii ‘9 

0-091 

I -08 

66*5” 

137-5 

153*0 

- 15*5 

— O-lOI 

1-57 

67*5- 

154 

157-1 

— 3*1 

-0-020 

o*o6 

68 - 5 - 

141-5 

141 -o 

0*5 

0*004 

0-00 

69-5- 

116 

no '5 

5-5 

0-050 

0*27 

70 * 5 “ 

78 

75-7 

2*3 

0-030 

0*07 

71-5- 

49 

45*2 

3-8 

0-084 

0-32 

72-5- 

28-5 

23-7 

4-8 

0-203 

0-97 

over 73 * 5 

9*5 

17-2 

“ 7*7 

— 0*448 

3-45 

Total 

I 078 

I 078-0 

0*0 

— 

X‘—^'59 






n'= 12 , 






11 

0 


freedom, and we wish to see if the (8'59) resulting from those 
II can be attributed to random variations. Entering Elderton^s table 
at n' — iz, we find P = o-66, and this is so large that we might 
reasonably suppose the deviations to have arisen from errors of 
random sampling, and we say that the normal curve gives a good fit. 
Indeed, 66 random samples in 100 would have given a equal to or 
greater than 8*59. In computing x^ totsl of the fourth column 
(= o) checks the accuracy of the subtractions; and the terms in the 
sixth column are the products of those in the fourth and fifth. 
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Similarly, the distribution of germinating seed in Table 2.3 gives 
a of 2 • 897 when the groups containing four or more seeds are 
combined; there are five groups and three degrees of freedom (the 
mean and total have been equalised), and entering Elderton’s table 
at n' = 4, we find P == o -41, showing that the fit is a good one. 

An experimental distribution of flower colours and the Men- 
delian expectations are shown in Table 4.2 (Wheldale, 1907). 
There are 7 groups, and as only the total has been found from the 
sample (the expected frequencies, being determined from Mendelian 


TABLE 4.2 


Colour of Flowers 


Expected 

Frequency 

E^erimental 

Frequency 

Magenta 


118-34 

107 

Magenta delila 

• • 

39’44 

42 

Ivory 

• * 

52-59 

67 

Crimson 

■ « 

39-44 

42 

Yellow 

■ • 

17-53 

24 

Crimson delila 

« • 

13*15 

12 

White 

• • 

93-51 

80 

Total* 

• • 

374-00 

374 


theory, do not depend on the experimental data in any other way), 
there are 6 degrees of freedom (/z' = 7); = 9 • 81 and P = o • 13; 

hence the deviations of experiment from expectation cannot be 
regarded as significant. 

The Additive Nature of 

4 . 3 . A useful property of is that several values can be added 
together, and if the degrees of freedom are also added, the total 
deviations of several distributions can be tested by entering the 
tables at the total degrees of freedom and findiing the probability 
corresponding to the total x^- For instance, for the three distributions 
we have just tested, the total is 21 • 30, the total degrees of freedom 
are 20, and P for this value of x^ and «' = 21 is o • 38; thus, taken alto- 

* The figures given in the paper add up to 373*98, and we have added 
0*01 of a unit to each of the largest groups to make the total correct. There 
are also two groups with expected frequencies of zero, but these must be 
left out in this test. 
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gather, the differences are still attributable to random errors. This is 
the correct way of combining the experiences of several distributions, 
and it is quite wrong to find an average P or 

To illustrate, further this additive character of Table 4.3 has 
been compiled from some data of Mendel, quoted by Bateson (1913), 
and gives the numbers of round and angular peas from ten plants. 


TABLE 4.3 
Frequencies of Peas 


Plant Number 

Round 

Angular 

nt 

Total 

N 

Ratio 

nJN 

X® 

I 

45 

12 

57 

0-789 5 

0-47 

2 

27 

8 

35 

0 ■ 771 4 

0 *09 

3 

24 

7 

31 

0-7742 

0- 10 

4 

19 

10 

29 

0*655 2 

1*39 

5 

32 

II 

43 

0-7442 

0-00 

6 

26 

6 

33 

o-8i2 5 

0*67 

7 

88 

24 

112 

0-785 7 

0-76 

8 

22 

10 

32 

0*687 5 

0-67 

9 

28 

6 

34 

0*823 5 

0-98 

10 

25 

7 

32 

0-781 2 

0-17 

Total .. 

336 

lOI 

437 


.■ 

Expected 

327-75 

109-25 

437-00 




together with the ratios of the numbers of round peas to the totals. 
These ratios vary considerably, and we will see how far the varia¬ 
tions may be explained on the hypothesis that the peas from the 
ten plants are effectively ten random samples from an infinite popu¬ 
lation in which the ratio is 0-75 (this is the Mendelian expectation). 
We might calculate for each plant the quantity 

Demotion in Ratio from 0 • 75 
Standard Error 

and since the sampling distribution of each ratio is binomial, this 


0*75 


ih 
N 



(i - 0-75) 


0’75 

N 


I 
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If the deviations were j:andom, these ratios would be distributed ap¬ 
proximately normally with unit standard deviation, and none of the 
.ten would be expected to exceed z-o. Alternatively we may calculate 
ten values of ^iid since there are two frequency groups, each 
plant contributes one degree of freedom. In this simple case, 
happens to be the square of the above ratio. 

Fisher gives the value of on the 0*05 level of significance 

as 3'841 for one degree of freedom, and as none of those in Table 
4.3 is as large as this, no individual plant differs significantly from 
expectation. Further, for all plants combined, the proportion of 
round peas is 0*768 9, giving a x^ of 0*96, which also is insignificant. 
It may be, however, that the variability in the proportion of round 
peas from plant to plant is greater than can be explained by random 
errors, when all are considered together. To make this clear, we may 
imagine an extreme case in which the values of x^ iox all the plants are 
near the level of significance, but the ratio nJN varies above and below 
expectation, so that the ratio for all plants combined is near expec¬ 
tation. Then, although no individual plant appears to differ signifi¬ 
cantly from expectation, it is unlikely that all would be so near the 
level of significance if it were not that the deviations as a whole were 
real, but in different directions, so that when added they average 
out. To test such a point we may add the values of x^> giving a total 
of 5*30 for 10 degrees of freedom (n' — ii) so that P = 0*87, and 
we conclude from the data that the plants do not vary significantly, 
and they may be regarded as so many random samples of peas. 
Had there been enough plants, instead of adding the values of x^ 
we could have formed a frequency distribution of them, and have 
compared it with the theoretical form for one degree of freedom. 

The x^ t:est is thus an extension of the ordinary method of finding 
the significance of a single binomial mean, and enables the information 
given by a number of them to be combined. 

4 . 31 . It follows, of course, that if a number of values of 

degrees of freedom may be added and treated as one, a total x^ 

may be split up into parts, and each part may be tested separately. 

Contingency Tables 

4 . 4 . The problem of the sex-ratio of rats (in Table 3.5, p. 83) may 
be looked at in a different way. We are not concerned with the total 
numbers of males, of females, or of young resulting from the two 
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diets, but with the distribution of young in the four cells 
two sex-ratios. If diet has had no effect on the sex-rati 
observations would be expected to be distributed at random in the 
four cells, with the one restriction that they should add up to give 
the totals of the table. In the infinite population of tables with those 
totals, the probability of an observation falling in the “deficient” 
group is 276/571, and that of it falling in the male group is 268/571, 
so that the probability of it falling in the “deficient” male square is 


giving-^^ 
). the rv'f - 






268 


X 



571 


and the expected number of individuals in that square is that proba¬ 
bility multiplied by 571 == 129-54. Similarly, the other squares can 


TABLE 4.4 
Expected Frequencies 


Diet 


Males 

Females 

Total 

Vitamin B deficient .. 

M • 

129-54 

146-46 

276-00 

Vitamin B sufficient.. 

• ■ 

138-46 

156-54 

295*00 

Total 

« • 

268-00 

303-00 

571*00 


be filled as in Table 4.4; they are the frequencies that would be 
expected if sex were independent of diet and the individuals were 
distributed at random. We may now test to see if Table 3.5 differs 
significantly from Table 4.4 by finding for the four cells, and then 
the P for one degree of freedom. There is only one degree of freedom, 
since only one cell can be filled independently; the numbers in the 
others can be obtained from that one and the totals. In our example, 
X^ == 1-20 and P = 0-28 (from Fisher’s table); the deviations are 
not greater than can be attributed to random errors. It will be noticed 
, that this probability is practically the same as that obtained previously 

from the standard error of the ratios (0-27); indeed, it should be, for 
V both methods are equivalent, being based on the assumption that 
^^he number in a cell (or its ratio to the total) is distributed normally. 
^ Table 3.5 is an example of a fourfold contingency table. When the 
; individuals in a sample have two characters, and a frequency table 
is made classifying them according to both so as to show the relation 


Vj 

m 




io6 


METHODS OF STATISTICS 


between the characters, the result is a contingency table. In Table 3.5, 
the two characters are sex and vitamin B content of diet, and complete 
information regarding these for any one rat is given by its position 
in the table. If the numbers in the cells are not randomly distributed, 
the two characters are said to be associated’^ the sex of the offspring 
in our example is not associated with the vitamin B content of the 
diet of the parent. Contingency tables may be manifold, and if there 


TABLE 4.5 


Severity of Attack 




Hfflmor- 
. rhagic 

Confluent 

Abundant 

Sparse 

Very 

Sparse 

Totals 

r 

o-io 


I 

6 

II 

12 

30 



(0 • 87) 

(5-13) 

(9 * 02) 

(8•60) 

(6-38) 

Years 

0 

10-25 

5 

37 

114 

16s 

136 

457 


(13-25) 

(78-20) (137-45) (130-96) 

(97-14) 


since 

• 

25-45 

29 

155 

299 

268 

181 

932 

vaccina¬ 

tion 


(37-04) (159-47) (280-32) (367-07) (198-10) 


over 45 

II 

35 

48 

33 

28 

155 



(4 - 50) 

(26*52) 

(46 * 62) 

(44-42) 

(32 - 94) 


Unvaccinated .. 

4 

61 

41 

7 

2 

115 



(3 - 34) 

(19*68) 

(34-59) 

(32-95) 

(34-44) 


Total 

• • 

49 

289 

508 

484 

359 

I 689 


are n rows and m columns, there are (w — x){m— i) degrees of 
freedom. 

Table 4.5 is a 5 X 5 contingency table, being Brownlee’s data of 
the severity of smallpox attack and degree of vaccination (quoted 
by Pearson, 1910); below the frequencies, the expectations for inde¬ 
pendence are given in brackets. 

As there are several expected frequencies less than ten in the first 
row and column, these have been combined with the second row and 
column in working out, so forming a 4 X 4 table. 196-33, 
there are 9 degrees of freedom, and P is less than o-ooo 001; hence 
the association between degree of vaccination and severity of attack 
is overwhelmingly significant. Notice, however, that this gives no 
information as to the strength of association; a smaller sample would 
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give a lower signilicance for the same association, and a larger sample 
would give a higher significance. Indeed, this test does not even tell 
us whether increased severity of attack is associated with a longer 
or a shorter period since vaccination. Having established, however 
that the effect is real, we glance at the table again, and notice that 
for the more recently vaccinated patients there tends to be a deficit 
of severe attacks and an excess of milder ones; for the remotely 
vaccinated, the reverse is the case, and thus we establish the sense 
of the association; a measure of its strength will be discussed in 
section 9"S' Headers must be warned that association does not 
necessarily mean a causal relationship; it only means that in the 
population sampled there is a tendency for variations in the 
two characters to occur together. A much closer analysis is 

necessary to investigate causes; this also will be discussed later 
in section 7.3. 

The test of Table 4-5 association may be looked at in another 
way. The table consists effectively of a number of frequency dis¬ 
tributions of years since vaccination, and ^ is the measure of the 
deviations of these distributions from hypothetical ones deduced 
from the “totals” column and differing only in total frequencies. 
The independent “constants” of these hypothetical distributions are ' 
four of the proportionate frequencies in the totals column; the fifth 
may be obtained from the others, since they all add up to unity. 
In addition to these, the five totals in the last row give the five totals 
of the hypothetical distributions, so that altogether ntW “constants” 
have been fitted, leaving i6 degrees for a 5 x 5 table. Alternatively, 
the table may be regarded as a collection of five distributions in 
which the variate is the severity of attack. 

For expressing and testing association, the contingency table is 
exceedingly useful, since the characters need only be described 
qualitatively, and need not be given in any particular order. A 

rearrangement of the rows or columns in Table 4.5 does not affect 
the result. 


4 . 41 . A useful application of the test is to the comparison of a 
number of frequency distributions, e.g. for the purpose of checking 
sampling technique. The separate distributions may be regarded as 
rows in a contingency table, and if the is large enough, they are 
significantly different. As a special case of this, when there are two 
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io8 


distributions and g* groups in each, there are g' “ i degrees of 
freedom, and the expression for ^ becomes 


NJSf^ 


X = 


5 - 

s 







l«s + 


2 ^. 


* ( 4 ’ 3 ) 


where and are the two totals (they need not be equal), 

1^5 and are frequencies in one corresponding group in 
the two distributions, 

and S is the summation over all groups. 

The grouping must, of course, be the same for both distributions 
under comparison. These tests are alternative to those involving 
only the means and standard deviations, but are more complete, since 
they compare the distributions in all respects. 


General Notes on the Distribution of 

4 . 5 . The distribution of is of considerable general importance in 
statistical theory, and a proper understanding of its nature is desirable. 
We have introduced it as it first appeared historically, as a measure 
of the differences between two frequency distributions. More gener¬ 
ally, however, if we have any quantity x which is distributed normally 
about zero with unit standard deviation, and we add the squares of 
g independent values of x, the distribution of this sum in the infinite 
population is that of for g degrees of freedom. 

Thus, the distribution of is closely related to that of N times 
the variance, 5^, in samples of N from a normal population in which 
the standard deviation is imity. Indeed, the sum of squares of ^ inde¬ 
pendent deviations from the population mean is equivalent to ^ + i 
deviations from the sample mean, so that if in equation (2.8), p. 66, 
we write 


♦ 


N ^ g I, 


^2 ™ 




g+ I 


and a = I 


in the distribution of s, the equation reduces to the form of (4.2). 
Tables of the probability integral of only go up to ^ = 30, but 

for larger values of g it is sufficient to assume %/2x^ to be normally 
distributed about a mean of — i with unit standard deviation 
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(Fisher, 1936a). Notice, however, that here we are asking not “is 
the observed V significantly different from the mean sf — i ?” 
but rather “is the observed Vsignificantly greater than zero?” 

and consequently the deviation of V2^® which has a tail 5 per cent, 
of the whole curve is equivalent to the 5 per cent, level of signifi¬ 
cance^ Suc h a dev iation is 1-65 times the standard error, and 

“sf 2 x^ — -yzg — 1 has to be greater than 1-65 to be significant. 
When g = 30, lying on the 5 per cent, level on this assump¬ 
tion is 43-5, whereas Fisher’s tables based on the correct distri¬ 
bution give 43'773, so the approximation is close enough for large 
samples. 



CHAPTER V 


SMALL SAMPLES 

It is not always possible to obtain large samples, such as are suitable 
for application of the methods of the last chapters, so the theory has 
been developed to give more exact methods suitable for small num¬ 
bers. We stated in section 3.1 that there is nothing in the definition 
of a random sample that depends on its size, and would reiterate 
this in view of the existence of a fairly common impression to the 
contrary, of which the following quotation is typical. 

‘‘Moreover, in agricultural experiment, the number of observations 
rarely, if ever, is sufficiently large to allow full play to the laws of 
chance.’’ 

This is a misconception, for the laws of chance (as we know them) 
have full play every time a single individual is drawn from the popu¬ 
lation, and the theory of random sampling may be applied to small 
samples. There are two kinds of modification of the theory for large 
samples. The first consists in developments to the theory of esti¬ 
mation, which do not alter the general principles described in 
section 3.7. We need not concern ourselves with them any further. 
The second kind of modification is of practical importance and 
consists in using exact sampling distributions instead of the approxi¬ 
mate ones used with large samples. We shall deal with this more fully. 

Unfortunately, although the limitation of size has been removed 
that of form has not, and most of the following results apply only to 
quantities distributed normally. 

Variance Estimated from Small Samples 

5 , 1 . With a few observations, it is futile to form a frequency distri¬ 
bution, but the usual frequency constants may be calculated, and 
regarded'as estimates, obtained from the sample, of the constants of 
the infinite population. For the normal population, the mean is found 
in exactly the same way as for large samples, but the best estimate 
of the variance (a^) is obtained by dividing the sum of the squares 
of the deviations from the mean, not by the number of observations, 
but by the number of degrees of freedom. Here the number of degrees 
of freedom is the number of deviations minus the number of con¬ 
stants determined from the sample and used to fix the points from 
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which those deviations are measured; in the simple case, when the 
mean only is found from the sample, the degrees of freedom are one 
less than the number of observations. We will illustrate this by the 
data of Table 1.3, which are the 100 random observations from an 
artificially constructed population, divided into groups of 5. We may 
find the variance either from the squares of the deviations from 
the grand mean or, regarding each group of five as a small sample, 
from the squares of the deviations from the sample means. For 
the latter process, instead of finding the variance for each sample 
separately, we may sum the squares of all the deviations and divide 
by the total 'degrees of freedom contributed by the ten samples. 
The sum of squared deviations from the grand mean is 8 864-75, 
and on dividing this by 99 we obtain the variance, viz. 89-5. The 
sum of squares from the sample means is 7 056*4, and since each 
sample contributes four degrees of freedom, there are 80 degrees 
altogether and the variance is 88*3. There is quite a fair agreement 
between the two estimates.* If we had used the old method for 
'large samples we should have obtained values 


100 


and 


7056-4 

100 


70-6, 


with a much poorer agreement. In a large sample, the difference 
between dividing by N and [N — i) is quite xmimportant. 

As a special case, it may be determined that the mean variance for 
a number of pairs is 

zM 

where and X2 are the individuals of any pair, M is the number 
of pairs and S is the summation over all pairs. In this way, the 
variance of any character between brothers from the same parents can 
be obtained ^as accurately from pairs of brothers from a himdred 
families as from a single family of one hundred and one brothers. 


5 . 11 . The justification for using the new estimate when the devia¬ 
tions are measured from the mean arises mathematically from the 

* We have not tested the agreement by the ordinary sampling theory, 
using the standard errors of the values, since the values are not independent; 
they have been taken from the same data. 




where K'^ is a constant and N is the size of sample. The mean 
value of v' obtained by integrating this distribution is 



N -I 
N 





Thus, as an estimate of v' tends on the average to give a biassed 
value that is slightly smaller than the population value. It is prefer* 
able in dealing with small samples to use the unbiassed estimate, 

_ JVb' S{x — xY 

®^iv^^ iv-i 


Significance of Means: The t T^t 

5 . 2 . When testing the significance of the deviation of a sample mean 
from an assumed population value, we used the fact that the ratio 
of the deviation to its standard error is distributed normally with 
unit standard deviation.* If d is the deviation, a is the population 
value of the standard deviation and N is the size of sample, this 
ratio is 

d 

I III* 

cr 

Vn 

Now, in practice we do not always know cr and have to substitute 
the sample estimate, s. When dealing with large samples, ^ and a 
are sufficiently alike to make this course reasonably accurate; but 

* Ihe ratio is the w of equation (a.5), section 2.51. 
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when the samples are small, some allowance must be made for the 
error involved in using s instead of a. 

The ratio by which we test the significance of a deviation may 
be written 

d 

t ==-1 

s 

Vn 


and if we know the sampling distribution of t, we may carry out 
the test exactly, without being limited to large samples. This was 
first pointed out by Student” (1908) and he gave some tables of 
the sampling distribution of a quantity z closely related to t. These 
have now been generally superseded by tables of t given by “Student” 
(1925) and Fisher (1936a). When computing i for use with Fisher^s 
tables, s must be taken as the square root of the variance calculated 
correctly from the sample by dividing the sum of squares of devia¬ 
tions by the degrees of freedom. Thus, 


^ Sx ^ 




where ^ is the population mean. 

The sampling distribution of t is symmetrical and depends only 
on the number of degrees of freedom on which s has been estimated; 
it approaches the normal form as the degrees of freedom increase. 
Fisher’s tables give, for different degrees of freedom, called values 
of t lying on various levels of significance, the levels being the sum 
of the two “tails” of the distribution as for the third column of our 
Table 2.5. 

For large samples, f = 3*0 lies on the 0-05 level of significance; 
other values that lie on the same level are: ^== 12-7 for samples 
of 2 {« = i), t = 4-3 for those of 3 (;z = 2), t ~ 2-8 for those of 4 
{n = 3) and i = 2*3 for those of 10 (n = 9); the effect of the non¬ 
normality of t is serious for samples much smaller than 20. 

Table 5.1 shows the effect of a small electric current on the 
growth of maize seedlings, giving the difference between the elonga¬ 
tion of the tr^ted and untreated in parallel pairs of boxes (data from 

H 


m- 
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Collins^ Flint and McLane, 1929), a positive difference showing that 
the electric^ treatment increased the rate of growth. The mean is 
4-29 mm.; is this significantly different from zero? The sum of 
squares of deviations from the mean is 937*1895 and since on our 
hypothesis | = o, 


t = 4-29 


90 

937-189 


^ *33i 


and H = 9; according to the tables, t= 1*38 lies on the 0*2 level 
of significance, and there is thus no evidence from the sample that 

the treatment has made any difference to grovrth. The mean elonga¬ 
tion is not large enough compared with the variations between those 
of the separate boxes to be significant. 


TABLE 5.1 

Elongation in Mm. (Tkeated—^Untkeated). 

0*0 

1-3 

10*2 

23*9 

3*1 

6-8 

— 1*5 

— 14*7 

” 3*3 
Mean 4*29 

Essential Character of t 

5.21. Essentially, t is the ratio of a quantity distributed normally 
dK>iit a mean of zero, to an estimate based on n degrees of freedom, 
of the standard error of that quantity. Any such ratio, however it 
arises, has the same sampling distribution as 


Si'GNIFICAKCE OF DIFFERENCES BETWEEN MeANS 

5.3. The quantity t may be used for testing the significance of the 
difference tetween two sample means. We shall deal here only with 
the sitoation in which the assumption is made that the samples 
are from pc^utoions having a common standard deviation a as well 
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as the same mean.* Then, if Nj and are the numbers in the 
two samples, and and X2 are the sample means, (% — x^ is dis¬ 
tributed normally about zero and an estimate of its standard devia¬ 
tion (or error) is 


5 



I 



where s is obtained by summing the squares of the deviations from 
the two sample means, dividing by the total degrees of freedom 
and finding the square root. Thus 


^1)^ *^2(^2 ^ 2)^ 


= 


• • (S-2) 


where and *52 summations over the two samples and 

% and X2 are individuals in the two samples. Then 


t = 


(xi ^ 2 ) 



(5-3) 



which in large samples is distributed normally with unit standard 
deviation, is in small samples distributed as Fisher’s t, the degrees 
of freedom being 

Our example is from data provided by Corkill (1930) showing the 
effect of insulin on rabbits; the results for separate animals are given 
in Table 5.2. The difference in means is not large compared with 
the variations within each sample, and a statistical test of signifi¬ 
cance is necessary. The sums of squares of the deviations are o • 253 o 
and 0*071 5, and thus 


^ 0*3243 

= 0*017 08 5 ==0*130 7; 

* Since normality is also assumed, the hypothesis and assumptions 
amount to the composite hypothesis that the two populations are one. 
Judging from experience, however, the tests are not very sensitive to 
moderate departures from normality nor to small differences in standard 
deviation. 
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tiicrc src tcB snd eleven rabbits in the two samples, so 


and 



0*436 9 


0*138 

o* 130 7 X 0*436 9 


== 2*41. 


TABLE 5.2 


Muscle Glycogen (per cent.) 


! 


Controls 


After Insulin 


r 

i 

I 

1 


0*19 

0*15 

0*18 

0*13 

0*21 

Trace* 

0 * 30 

0*07 

0*66 

0*27 

0*42 

0*24 

O’ 08 

0*19 

0*12 

0*04 

0-30 

o*o8 

0*27 

0*20 

— 

0*12 


Means 0*273 


0*135 


* Assumed to be o • 00. 

The mean variance has been calculated on 19 degrees of freedom, 
and for these conditions, ^ = 2*09 lies on the 0-05 level of signifi¬ 
cance; thus the effect of insulin on muscle glycogen appears to 
be r^. 

The difference between this and the earlier example of the effect 
of electrical treatment on growth is that here there is no reason for 
taking controls and treated animals in pairs, they are all independent; 
in the former instance, boxes were treated in parallel pairs, and there 
was some reason for expecting that these would be subject to some 
of the same disturbing factors, which thus would not affect the 
differences. 
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Significance of Differences between Variances 

5 . 4 . When testing the difference between variabilities for large 
samples in section 3.53, we assumed the difference in 

standard deviations of the samples, to be distributed normally with 
a standard error of 



where g is the standard deviation in the population from which the 
samples are presumed to be taken; and not knowing a, we substituted 
$1 and ^2 in the expression. Errors arising from this approximation 
are again important in small samples, but Fisher (1924^^ and 1936^) 
has suggested a new index: 

• 2 = i (log, sf — log, 4 ) = log*-> • - • (54) 

where sf and 4 die two variances calculated on the degrees of 
freedom. For samples of and iVg drawn from the same popu¬ 
lation, if die numbers of degrees of freedom, z is 

distributed in the form 

where and are the degrees of freedom (= — i and iVg — i) 
and A is a constant.^ This distribution, containing as variables 
only Zy Bj and Bg, is independent of the standard deviation of the 
population, a, and since it involves no approximating assumptions, 
is applicable to small samples. The quantity z may vary between 
plus and minus infinity, being negative when is less and posi¬ 
tive when s-Js2^ is greater than unity, and imless = Bg, is skew. 
The positive part of the curve of 2: = s-Js^y however, is the same as 
the negative part of ^ and so the probability integrals for 

positive deviations only are sufficient for any combination of degrees 
of freedom, the others can be obtained by interchanging b^ and n^. It 
is simpler, however, not to deal with negative values of Zy but always 
to take the difference of logarithms so that it is positive, and hence to 
choose Bi to be the degrees of freedom on which the larger variance 

* This distribution of z is related to that of ^ and hence to that of (see 

Fisher, 19240). 
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is measured. Fisher (ig^ 6 a) gives values of 3 at which ordinates cut 
off ‘*tails’^ of 5, I and o-i per cent, of the total area of the curve 
for values of and chosen according to the above convention.* 
These points, however, are not the corresponding levels of signifi¬ 
cance as we understand them. The hypothesis is that the two variances 
are estimates of one and the same population value, and that z 
(which is the difference between the natural logarithms of the two 
standard deviations) is zero. Applying the arguments of section 3.32, 
we regard as lying on the 0-05 level of significance values of z at 
which ordinates cut off tails, each of which is about 0*025 
who|e area; the 5 per cent, level of significance, therefore, lies some¬ 
where between the 5 and i per cent, points of Fisher's tables, unless 
there is some a priori reason for supposing the variance estimated 
by sf is either equal to or greater than that estimated by 4 (com- 
ime p. 77). 

When % and are large, or when they are moderate and nearly 
equal, the distribution of z becomes nearly normal with a standard 
error of 


2 





and in such circumstances a value of z greater than twice this value 
lies above the 5 per cent, level of significance. For example, when 
% == iig = 24, this standard error becomes 0*204, a of 0-408 
lies on the 5 per cent, level (that is between o • 342 5 and o * 489 o, 
at which according to Fisher's tables P = 0*05 and 0*01). In order 
to check the assumption of normality, we may compare the deviation 
at which an ordinate cuts off a tail of 5 per cent, of the whole curve 
(i • 65 times the standard error = i * 65 X o • 204 = o • 337) with 
^*34^ S 5 agreement is quite good. 

For the heights of Englishmen and Scotsmen of Table 3.1, p. 74, 
the standard deviations are 2 * 54^ and 2 * 480, so z = log^ i - 027 
= o-oz6 6, and using the above approximation its standard error is 

+ = 0-021 5; 

^ 2\6 194 I 304y ^ 

* Mahalanobis (1932) similar tables for the 5 and i per cent, points 
of and by using th^e, the rather troublesome transformatiQn 

of equatkm (54) may be avoided. Tables Dx and D2 at the end of diis 
i»ok give cxMcrespcttiding values of logi© Jx/4- 
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z is only 1-23 times its standard error and is not significant. The 
other test in section 3.53 gave the difference in standard deviations 
as I' 27 times its standard error, and this shows that in very large 
samples it is as well to use the difference between standard devia¬ 
tions as z. 

As an illustration of a small sample, we will test the difference in 
variability of muscle glycogen between the controls and insulin- 
treated rabbits of Table 5.2. The sums of squares are 0-253 o 
0-071 5, the variances are 0-028 ii and 0-007 15, z == l-log^ 3*93 
== 0-684, % = 9 and TZg = 10. ^ 

Fisher’s tables do not give z for these values of and Wg, but 
interpolation is made easy by the fact that for any one value of 
dianges in z are nearly proportional to Linear interpolation 
with respect to i/% is therefore appropriate. We shall write z^ and 
% for values on the 5 and i per cent, points. 

Then, for ^2 ” 


when Wj = 8, 

— = 0-12500, 2r5 = 0‘56ii, and — 0-8104; 

Hi 


when Wt = 12, 


— = 0-08333, •2'5 = o -534 6 , and 2ri = 0-7744; 


n 


and the differences are 


diff. 


= 0-041 67, dijf,[z^ = 0-026 5, and dijf,{z-^ = 0*036 o. 


Hence, when n-^ — 9, 


= o-iii II, z^ — 0-561 I — 
1 


and z-^ = o-8io 4 — 


0-013 ^9 
0-041 67 

0-013 ^9 
0-041 67 


X 0-0265 = 0-552 3. 
X 0-036 o = 0-798 4. 


The value z = 0-684 between the 5 and i per cent, points, but 
it is doubtful if it is on the 5"per cent, level of significance, and we 
can only say that although the data suggest that the effect of insulin 
has been to make the muscle glycogen percentages more regular, 
further observations are necessary to establish the fact. 
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Interpolation with respect to Wg is scarcely ever necessary, since 
the tables are given for % increasing by units between i and 30, 

Significance of Variations between Several Samples 

5 - 5 . The method of section 5.3 tests the hypothesis that two samples 
are from populations having the same mean, assuming they ^ve 
the same standard deviation. IVhen there are more than two 
samples, the hj^othesis may best be tested by the analysis of 
variance described in the next chapter. 

It is sometimes useful to test the significance of the general 
differences between the variances of more than two samples. For 
instance, it may be desired to combine data from several experi¬ 
ments or sources, and before doing so, it is advisable to ascertain 
that the variance within each set of data is sensibly the same; or 
again, when examining the products of a factory statistically, any 
difference in the variability of the articles produced on different 
iDachines, sections, or times is taken as evidence of lack of control. 

Neyman and E. S. Pearson (1931) have proposed an index for 
testing this. If there are k estimates of variance, each 

based on N observations (ra = iV — i are the degrees of freedom), 
the index is"* 

1 

Lj = —- i - 

I 

^^1 1 ^2 d” ... 

Thus, Lj is the ratio of the geometric to the arithmetic mean of the 
variances. Mahalanobis (1933) and Nayer (1936) have tabled values 
of Li lying on the 5 and i per cent, levels of significance for various 
values of k and iV.f Nayer’s tables are fuller than the others. One 
property of L-i is that it decreases in value as the variances differ 
more among themselves; it has a maximum value of unity when 
the variances are equal. 

In a series of six experiments (Tippett, 1934), the following six 
variances of cotton yam breakages were obtained: 0-2056, 0-5578, 
0-6459, 0-3378, 0-6194 “d 0-4311; each was based on 9 degrees 

* Our notation differs from that of Neyman and Pearson. 

f Mahalaiiobis and Nayer write it for the number of observations in 

corresiwndmg to our N, In applying this test, we shall consistently 
use the mnall letter for de^ees of freedom. Nayer writes / for our m. 
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of freedom. Before combining the results of these experiments, it 
was considered necessary to test the variances for homogeneity. 
Here, A == 6, N ^ lo and jLi = 0*930. From Nayer’s tables, 
=z o-8i lies on the 5 per ceat. level, and remembering that a 
high Li corresponds to greater uniformity of variance, we conclude 
that the value of 0-93 is not significant. 

Small Samples from the Binomial and Poisson Distributions 

5 . 6 . In Chapter II we used the binomial and Poisson series to test 
the homogeneity of a large number of counts of seeds and yeast 
cells; but it is not always possible to obtain such large samples from 
one population. However, if there are small samples of replicate 
counts from a variety of populations, the experience so obtained can 
be reduced to a common measure and added. 

5 . 61 . We will develop the tests for the binomial by reference to 
Table 4.3. The numbers of round and angular peas and the totals 
may be regarded as forming a 2 X 10 contingency table, and obtain¬ 
ing the expectations from the totals (i.e. ignoring the Mendelian 
expectation), it may be tested for randomness by means of the 
calculated on 9 degrees of freedom as shown in section 4.4. If 
this value of Is significant, the data are not random nor homo¬ 
geneous. If there are several such tables obtained (say) from a 
number of varieties of peas in which the expectations are not neces¬ 
sarily equal, all the values of x^ ^d their degrees of freedom may be 
added, and the sum tested in the usual way. Further, there may 
be enough tables with the same number of degrees of freedom (say 
a hundred or more) to form a frequency distribution of x^, and this 
may be compared with the theoretical form obtained from the 
standard tables. The constant x^ is thus the common measure to 
which a variety of experiences may be reduced. 

The expression for x^ becomes particularly simple if the number 
of peas from each plant is constant. Using more general language, if 
there are g' parallel sets, each of n trials, with , successes 

in the sets and a mean of m successes,* then 



* In Table 4.3 the g' sets are the 10 plants, the n trials are the total peas 
on each plant (we are now dealing with the case where n is constant), 
and the ^2 • • • successes are the numbers of round peas. 
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and there are (g' — i) degrees of freedom. This, like the obtained 
in section 4.3 under a slightly different assumption, is a measure of 
the variation in between plants. If we had a number of such 
sets of 10 for different kinds of plants, we could find the x^ for 
each set, and compare their distribution with the theoretic^ one 
for 9 degrees of freedom. In the same way, for instance, the uni¬ 
formity of conditions in seed germination can be investigated from 
a large experience of many sorts of seed, as long as there is the 
same number of replicates (g^) on each seed. 

In one series of experiments made to find the effect of various 
treatments on cotton seeds, 100 seeds were exposed in two lots of 
50 for each treatment. In such a case, if and Wg are the numbers 
germinating in the two lots of 50, 


X 


,2 _ 


{m^ — 


(^1 + mA I — 




100 


and there is one degree of freedom (g' = 2). There were 52 treat¬ 
ments, giving a total x^ of 8 z • 84, The tables do not extend to so many 
degrees of freedom, so we may use the approximation mentioned in 
section 4.5, assuming 

Vzx^ — Vzg — 1 

to be normally distributed with unit standard deviation. Here g = Kz, 
and 

V^— Vzg — I = 2-72, 

corresponding to P = 0-003; this suggests that there must have been 
some lack of uniformity in conditions. The frequency distribution 
of ^ and the theoretical form as found from Fisher’s table for one 
degree of freedom are compared in Table 5.3; as Fisher gives 
cmly a few probability integrals, the intervals of are unequal. 
There are obviously too many pairs of counts with large values of 
X®, and to test the divergences we must use the rather overworked 
ctmstaat, x®, again; but as shown in section 4.1, calculating a x'® 
MXOTding to equation (4-1) as a measure of the differences between 
the two distributions of Table 5.3.* In doing this, we have com- 

Tfais use of x* may be a little confusing, but if readers for 

tfie moment forget that the variate of Table 5.3 is x®, and just regard the 
tahfe as two distributions, the differences between which are to be tested 
by calmlating all will be welL 
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bmed the groups in pairs according to the brackets, and obtain 
= 7*86, which for three degrees of freedom (i.e. four combined 
groups and only the total of the theoretical distribution has been 
adjusted) lies almost exactly on the 0*05 level of significance- This 
test of randomness is less stringent than the one above adding all 
the values for for it groups all those over 2*706 together, and 
takes no account of how much greater they may be," 

TABLE 5-3 


Frequencies 


X® - 

Earpected Observed 


0—0*015 8 

5'2 

si 


0-015 8-0*064 ^ 

5*2 

I 


0*064 2-0*148 

5*2 

3 


, 0*148 -0*455 

10*4 

10 

< 


0*455 “ 1*074 

10*4 

10 

> 

^ 1*074 -1*642 

5*2 



I • 642 -2 * 706 

5‘2 

8! 


over 2*706 

5*2 

loj 


Total 

52-0 

52 



5 . 62 . For the Poisson distribution, the index corresponding to that 
of equation (5.6) is 

2 _ S{m^ — inf 


where is any individual coimt, m is the mean, and S is the sum¬ 
mation over all counts. Fisher, Thornton and Mackenzie (1922) 
first used this to test the accuracy of bacterial counts and found 
significant deviations from expectation. More recently, Smith and 
Prentice (1929) have used it to check their technique of cyst counts 
in soil. They had 73 sets of 10 counts from different samples of soil, 
and calculated the 73 values of The distribution of this is com¬ 
pared with the theoretical one for 9 degrees of freedom below in 
Table 5.4, and the expectations, being obtained from Fisher’s book 
(1936a), are for unequal intervals of 





Tte agreement is quite fair; it is tested in the usual manner by 
calculating another 


/2 = S- 




= 9-604, 


which for seven degrees of freedom gives P greater than 0-2. 








CHAPTER VI 


THE ANALYSIS OF VARIANCE 

Previous chapters are based on the presentation of collections of 
data in the form of frequency distributions and on the description 
of a few features, particularly the extent and form of the variation, 
by frequency constants. This is a process of summarisation in which 
some detail is inevitably lost. We shall now reverse the process and 
deal with statistical methods designed to recover some of the detail 
that is of value, starting from the frequency distribution and its 
constants as bases. Comparatively little has been done in this 
direction with the form of variation, but the amount of variation 
can be split up into parts associated with different causes or sources. 
The method of analysis of variation is introduced in its simple 
quantitative form in this chapter; and most of the subjects dealt 
with in subsequent chapters are developed from the same basic 
method. As a first step, we shall in the next section prove a funda¬ 
mental mathematical property of variance. 

Variance an Additive Quantity 

6 . 1 . Variability may be considered to arise from a multitude of 
causes producing small deviations; but it often happens that to these 
there are added larger deviations due to a few more important 
causes, and the variation is not always homogeneous. It is a funda¬ 
mental property of that measure of the degree of variability, the 
variance, that it is additive, i.e. if a quantity is subject to the operation 
of several independent causes each of which contributes a certain 
variance, then the final variance of the quantity is the sum of those 
due to the several causes. Let it be assumed, for example, that a 
quantity x is subject to random variations, and to others associated 
with two factors A and B\ then the value of any one observation 
of X is 

^ = ^ + a + j8H-f, 

where ^ is the mean, a and jS are the deviations arising from A and B, 
and is the random deviation. The square of the deviation of x 
from its mean is 

(x - 1)2 = a2 + ^62 + ^'2 + , 



Consider Table 6.i m which the data (Hams, 1910) are 
uency distributions of ovaries containing different numbers of 
les, and are for ten separate shrubs of the American Bladder 


differences between the shrubs, for while shrub ii 
zz-'^o ovules, shrub has ovaries with between 


4 A * — 

variations between ovaries on any one shrub are less than those 
t«tween all taken t<^ether, and the whole table suggests that we 
may legitimately divide the total variability into two parts: one 
associated with differences between ovaries from the same shrub, and 
another with differences between shrubs. We say that there is an 
association between the shrubs and ovules per ovary, and as evidence 
adduce the fact that the mean variance of deviations from the shrub 
means as found by the method of section 5,1, which is 3-057, is very 
much less than the variance of the “totals” column (5*385). 

Indeed, Table 6.1 is really a manifold contingency table, but the 
association is es^ressed differently because one variate iS quantitative j 

* a and j8 may be positive or negative. 
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a treatment by the method of the contingency table would be very 
laborious with so many squares. 

We may present the analysis of the variability into the two parts 
in a systematic manner. Let x be any individual reading, . . . 

, x„ the m shrub means, x the grand mean, and n the number 
of readings per shrub, so that there 3 ienm = N readings altogether. 

TABLE 6.1 

Frequencdes of Ovaries (Series O, 1908 C). 

I t Serial Number of Shrub 

'I , . — . _ _ - _ Totals 

■ [ ■ ‘ 

S II 12 13 _ 15 16 18 19 20 21 22 

U 


i 


1 

! 

17 

f 

— 

I 


18 

— 

2 

23 

— 

s 

19 

—— 

.5 

13 

> 


20 

— 

8 

27 

— 

>» 

21 

— 

10 

4 

I 

cl 

• > 

22 

^ 4 

21 

18 

— 

{0 

23 

5 

17 

7 

^ 3 

u „ 









! 



Cu 

eft 

24 

41 

^ 35 

7 

16 

eu - 





3 

25 





> 

21 

2 

— 

25 

0 

26 

9 

— 

— 

21 


27 

9 

— 

— 

8 


28 

4 

— 

— 

8 


29 

3 

— 

— 

12 


30 

4 

— 

— 

5 


31 

— 

— 

— 

I 


1 


I 

6 

I 

— 

— 

I 

34 

I 

4 

5 

5 

— 

1 

34 

I 

4 

I 

5 

2 

2 

50 

I 

5 

2 

7 

1 

1 

32 

7 

^3 

9 

12 

6 

2 

92 

9 

16 

15 

25 

8 

^5 

120 

42 

48 

44 

39 

61 

68 

401 

14 

3 

18 

6 

10 

8 

107 

9 

I 

4 

1 

4 

2 

51 

5 

— 

I 

— 

3 

— 

26 

3 

— “ 

— 

— 

2 

— 

17 

3 

— 

— 

— 

2 

— 

20 
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— 

— 

— 

1 

— 

14 

— 

— 

— 

— 

— 


I 


Totals .. 


100 roo 100 100 100 100 100 100 100 100 


I 000 


Then, for any ovary on any one shrub, the deviation from the grand 
mean is the deviation from the shrub mean plus that of the shrub 
mean from the grand mean: 

(x — ^ = {x — X,) + — x). 

We will square these deviations, and sum for all observations from 
the one shrub; denoting this sununation by S' and applying the 
rules in section 1.3 ii we obtain 

S'{x - xf = S'(x - x,f + n{x, - xY + 2(x, - x)S'(x - x,). 




I 


5 379-775 = 3 026-350 + 3 353*425. 

and each of these terms is given an appropriate place in Table 6.2 
of the Analysis of Variance. The terms are the sums of squares of 
deviations, (a) of individual observations from the grand mean, 
(i) of individual observations frcm the shrub means, and (c) of the 
shrub means from the grand mean (the sum being multiplied by 
n in this case). The degrees of freedom are given in the table; for 


add up to give the “total/^ and the sums of squares, divided by the 
degrees of freedom, give estimates of the variances. That for the 
“totaF® is the ordinary variance of all the observations in the sample, 
and that for "Vithin a shrub” is the variance of the intra-shmb 


deviations found after the method of section 5.1, and is an estimate, 
based on 990 degrees of freedom, of the true variance, (say). 
The variance ‘‘between shrubs” is n times the square of the stand^d 
deviation of shrub means. To investigate this more fully, let us 
supp<Be we have an indefinitely large number of ovaries from each 
of an indefinitely large number of shrubs, and let the variance of 
these shrub means be this is the true value for the infinite popu¬ 
lation. If now we have only n ovaries from each shrub, the means are 
subject to random errors due to the intra-shrub variation, and their 

standard error is uj's/n. These errors increase the variability of the 














the two components, as shown in equation (6.1), giving 


71 


the variance of Table 6.2 is an estimate based on 9 degrees of 


LV 

Vj, the variance within a shrub, as foimd from the sample 

(jM .X 




* • (^-s) 


These two relations are the basis of the expression of association. If 


TABLE 6.2 


Sum of Squares 



5 379*775 


compared with and the two estimates and are very different. 
If the shrub variation is zero, o and tends to equal In 
such a case, for normally distributed variates, and are two inde¬ 
pendent estimates of the same variance, cr,,^. They are subject to the 
same random errors as are all estimates of variance, and the sig¬ 
nificance of any difference should be tested by the methods of 
section 5.4. For Table 6.2, the one (261*492) is so much greater 
than the other (3*057), that the reality of the association needs no 
test. Using the relations of equation (6.3) 

261 -492 ^ 100 2+ cr, 

3*057 

whence 2 * 584 —> ^. 

* The sign denotes that the quantity on the left is an estimate of that 
on the right, and that the former approaches the latter as the size of the 
sample (number of degrees of freedom in both parts) increases indefinitely- 






Variance 


m — X 


V fid or ■ 
s s f 


7 n(n ~ i) = N — m 


variance 


should add up to give the total; this provides a check. Usually, 
however, this process is labqrious and it is convenient to apply a 
mcxiification of equation (1.2), p. 37. If the original units are used 
so that A = I, the appropriate part of equation (i .2) may be written 


S(^ — = Sx? —^5 

N 


• (^*4) 


where x and N have their usual meanings and T is the total 
value of the variate tor the N observations, i.e. T= Sx = Nx. 
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Applying this to equation (6.2) and writing T — SS'x and 
= S^x for the sth shrub we have 


and 



The process of computation may be written in words: 

(1) Square the individual values and add, 

(2) Find the total value of the variate for the arrays, square, add, 

and divide the result by the number of individuals per array, 
and 

(3) Find the total value of the variate for all individuals, square, 

and divide by the grand total number of individuals. 

Then if (i), (2) and (3) represent the results of the above operations, 
equation (6.2) becomes 

[(I) - (3)] = [(I) - (2)] + [(2) - (3)]. 

When the data are grouped, readers should have no difEculty in 
applying equation (1.3), p. 38. In such instances, it is better not 
to use Sheppard’s corrections, but to keep the grouping fairly fine 
so that they are unimportant. The precise effect of these corrections 
is not certain; but they increase the apparent association by reducing 
the residual variance, so to neglect them in testing significances is 
to be on the safe side. 

It may be convenient to measure the values of the variate as 
deviations from some arbitrary origin and to divide them by an 
arbitrary constant h before summing and squaring. Then after apply¬ 
ing equations (6.5) it is only necessary to multiply the resultant 
sums of squares by The process of computation based on equa¬ 
tions (6.5) may easily be performed exactly, with the aid of a table 
of squares, down to the last stages of dividing by the numbers of 

* These values of T are not the total numbers of individuals but are the 
total amounts of variate in the several arrays. In Table 6.1, T* is the total 
number of ovules in the 100 ovaries from the sth shrub. 











^ = i [log^ 95 '— log-s 88“2 ^o] == 0-038 I, % = 19 and = 80. 

From Fisher’s table we see that when iZi = 24 and = infinity, 
a 2: of o-2o8 5 lies on the 0-05 point, so our z of 0-038 i is quite 
insignificant, and the two estimates of variance are equivalent. 

Tables with Non-uniform Array Totals 

6 . 4 . The analysis of variance may be performed on tables in which 
the numbers of individuals in the arrays are not equal. If the numbers 
in the arrays are ^2 . - . equation (6.2) becomes 

SS’{x - xf = SS'{x - + Snlx, - xf . (6.6) 

and S{T,yn,) is written for ST//n in equations (6.5). 

In such instances, the relations of equations (6.3) do not hold, for 
in summing the squares of the deviations of the group means from 
the grand mean, each group has been given a different weight, 

If the true variance between groups is zero, however, 

and the test for the existence of association is that and are 

significantly different. 



























Length of Cuckoos* Eggs in mm. (central values) 


32*0 


22 *2 
22 "4 
22’6 
22*8 
23-0 
23*2 

23‘4 

23*6 

23*8 

‘n 
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Nest Tjrpe 


133 


19*6 
19*8 
20*0 
20*2 
20*4 
20*6 
20*8 
21*0 
21 -2 
21 *4 
21*6 
21 *8 


I 

— 

— 

— 

— 

— 

I 

I 

— 

I 

— 

— 

3 

5 

— 

I 

— 

I 

I 

3 

I 

6 

I 

3 

. 

I 



I 

I 

4 

3 

I 

— 

I 

3 

—, 

8 


120 


246 


69*69 405*60 401*78 110*25 273*07 317*40 504*30 





Sonroe of Vaiiatioa 


Sum of Squares 


Degrees of 
Freedom 


Variance 


1 

Between nest tjTpes 
Within a nest type 

1073-49 

2 356*21 

5 

114 

214-7 

20-7 

Total 

3429*70 

119 

— 


From these, the terms of equation (6.6) are found and inserted in 
Table 6.6. The variance between nest types is greater than the 
residual, and we will test it for significance. We find 

ar = |[Iag,2i4*7 — log,20*7] = i-iy, % = 5 and ^2=1^* 

For % = 5 and z == 0-779 8 lies on the o-i per cent, 

point, so we must conclude that the association between egg-length 
and type of nest is real. These variances are in terms of the arbitrary 
units (0-2 mm.), and if they are required in mm.^ they must be 
multiplied by 0*04. 


Relation between z and t Iters 

6 . 5 . A special case of analysis occurs when there are two groups; 
tills is the comparison between two samples. If there are n[ and n 
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observations in each group, and £2 are the means, and x is the 
grand mean; the analysis of variance is as set out in Table 6.7. 

Si and S2 are the summations over samples i and 2, and S is the 
summation over the whole; also, if is greater than 


z = i [log, — log, V,] = log, 

and the degrees of freedom are = i and 712 = n[ + — 2. 

The square root of the ratio of variances is the t of equation (5.3)> 
p. 115, which was used in testing the difference between two means. 
Thus, if we make the transformation 


^ = I log, or t = 
TABLE 6.7 



Analysis of Variance (Two Groups) 


f- 

\ Source of VaiiatioE 

Snm of Squares 

Degrees of Freedom 

Variance 

'' Betwreen samples 

► 

f 

"h ^2(^2 ~ 

I 


Within a sample 

Si(x— Xj )^+ S2(x — ^2)^ 

t , t 

«! + Wg — 2 

Vr 

Total 

Six-^ 

+ TZg — I 

— 


the distributions of t for n degrees of freedom, say, and of z for 
% = I and 712 = n equivalent. This is because z and t are 
mathematically related; for any value of t there is only one value of z, 
A slight modification in interpretation of the z test is necessary 
before we can show its equivalence to that based on For the simple 
use of -S' to test the difference between two variances, we showed in * 
section 5.4 that the 0-05 level of significance is about the 2*5 per 
cent, point; on the other hand, when used as an alternative to t, the 
0-05 point of z is also the 0*05 level. When testing two variance's, 
the hypothesis is that m the population z = o, and a positive or 
negative deviation, if sufficiently large, may be incompatible with this. 





are a little too much alike. These varying distmctions 
points and levels of significance are very puzzling, but 


* The fact 


lUt 


t, 






























» . . I--1-1— - .11 . -- [ 


Nest Nainber 

.. 

66S 

i 

670 

672 

674 

675 

Means 

November 

• ■ 

2-273 

2-479 


2-447 

2-456 

2-411 8 

January 

a * 

2*332 

2-603 

2-457 

2-3^8 

2-626 

2-481 2 

■ March 

• * 

2-375 

2-613 

2-452 

2*515 

2*633 

2-517 6 

May 

• ■ 

2*373 

2*557 

2*396 

2-445 

2-487 

2-451 6 

August 

• • 

2*318 

2-377 

2*279 

2 ‘312 

2-410 

2*339 2 

Means 

^ • 

2*334 2 

2*525 8 

2-3976 

2*421 4 

2*522 4 

2-440 3 


months, 672 and 674 nearly always come next, and 668 has the 
lowest mean in all but the month of August. 

The reality of the differences between months is not quite so clear, 
for although August has the lowest mean in all nests but one, and 
March the highest in all but another, the other months show no 
consistent differences. We can, however, test this in the same way 
by analysing the total variance into two parts, as in the second part 
of Table 6.g. The ‘‘between months” variance is certainly greater 
than the residual, but 

(sSf)= 


* This sort of table must not be confused with one like Table 6.1. Here 
the entries are measurements, there they are frequencies; here the characters 
are three in number (head breadth, nest number, and month of year), there 
they are only two (ovules per ovary and shrub number). ^ 

















o*] 

o*] 


4 

20 


0-034 36 
o*oo6 55 



Strength. As in all such tests, the probability of significance, 
pending as it does on the size of the sample, gives no indication 
that strength. 

Although the methods and tests described in this chapter are only 
plicable when the distribution within, each array is normal, work 
E. S. Pearson and others, published in Biometrika^ indicates that 
xlerate skewness, such as is apparent in Table 6.1, is not likely 

































distribution shows the extent and 
le of arrays introduced in the last 
parts of the variation with other 
discovers the nature of that asso- 
a quantitative variate. It will be 
5 may be approached in two ways; 
the one which is being analysed, 
e arrays. 

les, it is convenient to choose the 
El to twenty for each variatfe; if any 
• on the dividing line between two 
ach, and similarly it may happen 
assigned to each of four adjacent 
e characters, either the sub-raneres 



























fm mom per cm, 

Fig. 8.—Scatter Diagram. 


particularly if there are more than two characters; it is often possible 
to collect the data on cards and so avoid copying. 

If the observations are few, a correlation table is not suitable, and 
its place may be taken by a scatter diagram. This is just an ordinary 
graph in which x and y are the two variates, and points on this 
represent observations; Fig. 8 is an example, and shows the 
relation between length and fineness of the hairs of a number of 
varieties of cotton.^ If the relationship between the two variates 
is exact, the points in the scatter diagram lie on a smooth curve, 
while as the relationship weakens, the diagram more and more 

^ * Data by Morton (1926). 
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TABI!/E 7.1 —cmtimied 

Length and Longitudinal Girth of Eggs of the Common Tern 

r = - 1 - o • 89 

{Data from '‘A Co-operative Study,'*' 1933) 

_L_j__[i_... iiiiMiimi—LJLM*—-—^1 |am ii-n—T- "riri— ——^—i« 

Leagtii in cm. I 

^ , - Totals 

4-10- 4*15- 4*30- 4-25- 4*30^ 4*35- 4*40-r 4’45- 4'50- 4*55“ 4*^ 4'03- 

o I 2 3 4 5 617 S 9 lo ii 
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TABLE 7*4 

Daily Maximum and Minimum Temperatures at Rothamsted for August i878-=i9a6 

(Data by Fisher and Hoblyn, 1928.)*** t* ~ +0*30 
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* This table is condensed from that of the original paper by doubling the sub-ranges of the arrays. 
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results. The progress in devising systems of mathematical formulse 
to describe such surfaces is less than has been made for the single 
variate distributions, and methods based on the normal surface are 
applied to most distributions. The equation to the normal surface is 



N 

27Ta^y'\/ (i — T^) 


(f 


‘ <7 


y- xy 




) 


dxdy, 


where x and y are the two variates measured as deviations from their 
means, and dy are constants equal to the standard deviations of 
the two variates, r is a constant equal to the correlation coefEcient, 
N is the total number in the sample, and ^is the element of frequency 
in the range dxdy with its centre at (x, y). If we cut this surface 
by any vertical plane parallel to either the x- or y-axis, the section is 

a normal frequency curve of standard deviation cr^'\/1 — or 

dy \/1 — Yig. 9 shows the contours of surfaces in which corre¬ 
lation coefficients are o, o * 3, o • 6 and o - 9, and the standard deviations 
of X andy are both equal to unity; it will be seen that the surfaces 
all rise to a hump at the centre, but that they tend also to form a 
diagonal ridge which becomes narrower and sharper as r increases, 
as would be expected from the fact that the standard deviation of a 
sectional curve becomes smaller as r approaches unity. The correla¬ 
tion coefficient can never be greater than unity, and as it approaches 
that value, the ridge tends to become a thin outline in the form of a 
normal curve running diagonally; when r = o, vertical planes through 
the centre parallel to the x- and y-axes divide the surface into four 
equal quadrants. The correlation coefficient can be negative, but the 
only difference that makes is to cause the major axes of the ellipses 
to follow the other diagonal. 


Recession Lines 

7 . 22 . An easier way of treating a correlation table is to draw a curve 
relating the means of the arrays of x and y. This can be done in two 
ways; either we may find the mean of y for every array of x (i.e. 
referring to Table 7.1, find the mean girths when tihe lengths of the 
eggs are successively 3*55-3*60, 3'75-3‘8o, etc., cm.) 

or we may find the mean of x for every array of y (mean lengths 
when the girths are successively 9*80-9-90, 10-30-10 *40, 10-40— 
10-50, etc., cm.). In Fig. 10 are given the array means for each of 




Fi<3. 9.—Normai. Fshjuency Surfaces. 
























etc., cm., we have the means in Table 7.11, with length as the independent 
variable. To combine the three readings', we find the weighted means of 
the fength and girth. That of the length is 

3*575 + 2 X 3-725 -f 6 X 3*775 


and similariy that of Ihe girth is 10*59. Points obtained in ,this way are 
slwwn by laa^r orcte and dots in Fig. 10. 
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independent variable, and similarly the other, being more nearly 
parallel to the j;-axis, has y as the independent variable. 

The general equation of a straight line with x: as the independent 
variable is 

Y=ax+b 

where a and 6 are constants and the capital letter Y denotes the 
value of the dependent variable given by the line, as distinguished 
from the actual value of an individual, denoted by y. The problem 
of estimating the most appropriate values of a and b to fit 2. line to 
any given data is analogous to that of determining estimates of 
parameters for fitting frequency distributions, and the general method 
of solution that is most in accordance with statistical theory is the 
method of least squares. This consists in choosing the constants a 
and b so that the sum over all individuals of the quantities {y — 
is reduced to a minimum. A similar line may be obtained giving X 
in terms of y. The equations to the two lines are deduced in sec¬ 
tion 7.6 and may be written 

{Y — = r^{x — x) .... ( 7 . 1 ) 

CT f 

{X - x) = r'^{y — .... ( 7 . 11 ) 

where x and y are the two grand means, and Oy are the two 
standard deviations, and r is the correlation coefficient. The value 
of Y given by equation (7.1) is an estimate of the mean value of 
Ae array corresponding to a small sub-range about any given value 
of X, and that of X given by equation (7.11) is similarly the mean 
of Ae array corresponding to a given value of y. Equation (7.1) 
represents the line that is more nearly parallel to the x-axis; both 
lines pass through the point (x, jJ). These are called regression lines 
and the constants 


r— and r— 

<yy 

* In a diagram, Aese are the Astances of Ae points from Ae line measured 
in a direction parallel to Ae y-axis. They are not perpendicular distances 
from Ae line. FurAer, Aese points represent individuals, and not array 
means. If array means are used, a weighted sum of (ys “ is reduced to 
a m i n i m u m and a similar result is obtamed. 
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are called regressim coefficients. The coefficients are the tangents of 



tion indicating that x increases with 3;, while if r is negative the 
slopes are in the opposite direction, showing that x increases as y 


Anolym of Vmiwice 

7 * 28 . The preceding section, dealing with regression, pays attention 
to the rekticMiship between x and regarding the deviations rather 
as errors dne to uncontrollable factors; we will now consider 
coirreIati<»i table from the point of view of the variability. Adopting 
the mefficni of the last chapter, we can analyse the variance of say 
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^ girth (Table 7,1) into two portions, one associated with differ¬ 
ences between the array means for the various groups of length, 
and the other with residual deviations within the arrays. This can 
be done by finding the individual array means, as was done in the 
last chapter, and calculating their variance and the variance of the 
residual deviations from them. For Table 7.1 we should combine the 
first five and the last two, giving 18 arrays, or 17 degrees of freedom 
on which to estimate the variance between arrays, and 937 on which 
to estimate the residual. If, however, we are satisfied that a straight 
regression line like equation (7.1) sufficiently represents the trend, 
the value Y of the girth given by this line for various central values 
of the length (») may be used instead of the array means For 
Table 7,1 the line of regression of j on a: (as computed in section 
7.5) is 

(Y— 11*378)= 1*756 (;x; —4*190); 
when * = 3’S75, Y—11*378 = —1-080 and Y= 10*298, 
when = 3*725, Y—ii *378 = —0*817 Y= 10*561, 

and so on. For the sum of squares of the deviations of the array 
means from the grand mean (11*378) we take i X (“ i*o8o)^ 
+ 2 X (—“ 0*817)^, etc., adding these quantities for all arrays; for 
the squares of the residual deviations from the array means we take 
(9*85 — 10*298)2 + (io*35 — 10*561)2 + (11-05 — 10-561)2, etc., 
adding these quantities for all cells in the table, while for the total 
sum of squares we use the marginal totals as usual. These three 
sums may be foxmd and entered in a table of analysis of variance 
in just the same way as if we had used the actual array means. 
This process can be carried out explicitly as indicated, and the 
reader is recommended to do it as an exercise; alternatively, all the 
various sums of squares may be found from some of the constants 
of the table as shown below. We will set out the process algebraically. 

Repeating equation (6,6), section 6.4, except that the variable is 
changed to y, we have for the first method of analysis, 

SS'(y — yf = SS'iy — ysf + Sn,(y, — yY, . . ( 7 ^) 

^ S o 

and for the second method, using the Y given by equation (7.1) 
instead of y^, we have 

5 (y- 3?)2 = %-F)® + 5(y-j)^ . . . ( 7 . 3 ) 
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We shall now discuss the degrees of freedom. We have seen that 
in an infinite population for which x and y are independent, the 
regression of j on a; is a line through the mean, y, and parallel to 
the AJ-axis; it follows that for such a case (y — Y) equals (j ~ y) 
and the residual sum of squares equals the total. For a finite sample, 
however, owing to random errors a regression line will have a slight 
slope and the residual sum of squares will be different from the 
tot^. Moreover, since the regression line is obtained by the method 
of least squares the residual must always be less than the total. It 
follows from a proof given by Fisher (19256) that if the total sum 
of squares has n degrees of freedom, the amount of this difference 
is, on the average, one nth. of the total sum of squares, and that the 
residual sum of squares divided hy n — i is an unbiassed estimate 
of the variance of y in the population, having the sampling distribu¬ 
tion of an estimate based on n — i degrees of freedom.* Applying 
this result to Table 7.6 and remembering that there are N — 1 
degrees for the total sum of squares, we see that when the asso¬ 
ciation in the population is zero, the three variances in the last 
column are estimates of the same variance, that of j in the population. 

Thus, for zero association in the population, Table 7.6 is analogous 
to Table 6.3 with two arrays (one degree of freedom). If, now, there 
is an association between x and y, the difference between the residual 
and total sum of squares is greater than the proportion due to one 
degree, yV^ is less than yVx and is greater than either. We may 
carry the analogy between Tables 6.3 and 7.6 further by regarding 
y% as the variance due to the regression line; we shall make no 
attempt, however, to find an analogue to o-f in Table 6.3. The 
value yVf is an estimate of the true variance within arrays, and also 
of the variance of the distribution represented by a cross-section of 

* This is a special case of a more general result that if to a series of N 
independent observations an equation or series of equations involving u 
constants altogether is fitted by the method of least squares, the sum of 
squares of the deviations from the values given by the equations, i.e. the 
residual sum of squares, divided by N — w is an unbiassed estimate of the 
variance in the population. Thus, we may say that each constant so deter¬ 
mined from the data absorbs one degree of freedom- The one degree of 
freedom associated with the linear regression line is due to the slope, 
other constant, y, is eliminated from both the residual and total sum of 
squares. This result also follows from a proof given by Irwin (1931). 

This property is a powerful recommendation for the use of the method 
of least squares in statistical work. 
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of them we use the regression formula to guess at the other, we shall 
be right six times in ten trials. Such an explanation is nonsense, 
and the truth is that the standard error of our guess will be reduced 

to VI — 0*6^ = o * 8 of what it would have been had we not used 
the formula. Such a reduction in standard deviation may be further 
interpreted in terms of reduction in frequencies of estimates deviating 
from the actual values by given amounts, on referring to section 2.5Z. 

The following typical cases may assist the reader. Yule (1927) 
gives the correlation coefficient between the ages of husbands and 
wives in England and Wales as 0-91, and that between the age and 
the standard of elementary school boys is of the same order (Jones, 


TABLE 7.7** 


Correlation 

Coefficient 

rz 

Ratio of Variances 
Residual/Total 

Ratio of Standard 
Deviations 

/ 


zero 

I - CO 

I -ooo 

zero 


0*2 

0*96 

0*980 

0*20 


0*4 

0-84 

0*917 

0*42 

1 

0*6 

0-64 

0*800 

0*69 


0’7 

0-51 

0-714 

0-87 


0-8 

0*36 

0*600 

I *10 


0*9 

0*19 

0-436 

1-47 


I -o 

1.11,— — .. 

zero 

zero 

infinity 



* The column under z' will be referred to in the next chapter. 


1910). The likeness between parents and children is expressed by a 
coefficient of about 0*47 as far as height and several other physical 
diaracters are concerned (Snow, 1911), and that between cousins or 
uncles and nephews, which is so small as scarcely to be noticeable 
to the ^‘man in the street,” has a correlation coefficient of about o '26. 
Similarly, taking periods of six days as units, the correlation between 
the rainfall and hours of bright sunshine in Hertfordshire is negative 
(Le. increased rainfall is associated with decreased sunshine) and 
about 0*2. 

TTie uses (and abuses) of the coefficient of correlation are many, 
but it noay be always safely regarded as a constant descriptive of the 
properties of the sample. For instance, Weldon found that the corre¬ 
lation between number of pistils and stamens of late flowers of 
Table yja was 0-75, while for the early ones, Table 7.3, it was 0*51; 

L 
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ities are associated ( 
rence often made i 
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_? to all other variations, and if the variance of the indepei 

variable can be controlled, either experimentally or by selecuon, 
the correlation can be altered. It is thus not a physical quantity in 
the way the regression is. To demonstrate this, we will suppose the 
independent variable to be x, the dependent one y, the regression a, 
the correlation coefficient r, and the standard deviations and 
Then if is the standard deviation of the residual deviations of y 
from the regression line, which we may suppose to be constant for 
all values of x. 


CTyV I 


7-2 

/ c. 


But 


a = r— 


whence 


0 -. 




r2 cr. 


*> 2 

r2 _ 




Now cTy and a are assumed to be constant, so that r is not inde¬ 
pendent of cr^. For the relationship between numbers of pistils x and 
stamens 3^, r = 0*749; ~ 3*388, Oy = 3-298, = 2-185, 

a — 0-729 and = 2-470. If now we had selected flowers so that 
the standard deviation of the number of pistils had been reduced by 
one-half, we should have reduced the correlation coefficient from 
0*749 O'492, and we should have increased it to 0*914 if we had 
been able to increase the variation of pistils so as to double the stan¬ 
dard deviation (these results may be obtained by substitution in the 
above formula). The same consideration applies when the effect of 
some factor like temperature on some variable quantity is being 
investigated experimentally; by altering the range of temperatures, 
the correlation coefficient may be altered considerably, and so it is 
a result of the arbitrary experimental conditions. When such con¬ 
ditions are under the control of the experimenter, the amount of 
variation in x should be made as great as possible so as to make t 
large; we shall see in the next chapter that this gives a maximum 
precision to the measurement of <2. A low correlation coefficient 
does not necessarily mean that x is incapable of having an important 
influence on j; it may mean that an insufficiently large range of x 
has been tried. Or if rr cannot be varied, it may be that although 
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of section 7.21. For x, y, and it gives the estimates already 
deduced for single distributions and for r it gives 


_ S{x-x)(y-y) .... 

*%/ S{x — £fS(y — yY 


where S is the summation over all individuals. 

To be consistent with the earlier treatment of frequency distribu¬ 
tions we should denote the population value by p and describe the 
above value of r as an estimate obtained from a sample. 

From equations (7.1), (7.11) and (74) we have for the regression 
coefficients 


Regression of y on x ~ 


and 


Regression of x on y = 

If iV is the size of the sample, 

S{x- 


__ S{x — x){y — y) 

• ( 7 - 5 ) 

0^ S{x — xf 

__ S{x -x){y — y) 

• (7-5I) 

- yf 

x){y-y) 



N 


is called the first product moment (= say), whence we see that 

N p 

where and Oy are calculated correctly on iV — i degrees of free¬ 
dom; if the sample is large Nl{N — i) may be taken as nearly equal 
to unity. 

For moderately small samples, r may be calculated directly from 
the individual observations, using equation (7.5), but when the sample 
is large and the data are grouped, a rather more elaborate method of 
computation, such as is illustrated in the next section, is advisable. 



Computation of Coefficients 

7 , 5 . The denominators of the above equations may be computed 
by the methods of section 1.31 or 6.3, and here we need only concern 
ourselves with the product term in the numerators. It will be con- 


To find the product moment, we must first find Sx'y'. This could 


e one m 


154, the next m the top row would be — 14X —io = + 140, 
so on), and then multiplying each x^y' by the number in the cell 
adding; in this table, all the squares in the two quadrants 
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in the other two would have negative ones. A better method, how¬ 
ever, is to do the summation in two parts, keeping 3;' constant first, 
summing all the in each array of j' separately, and then adding 
the arrays. If we consider any array of y' with tiy obser\’-ations: 

Sx'y' = Sy'S'x'y = Sy'n^'y, . . . ( 7 . 63 ) 

s s 

where S, S and are the summations used previously, and Xy 

is the mean x' of all the observations in the sth array of y. 

The process is carried out in columns (5) and (6) of Table 7.8. 
In the array y' — — 14, Xy is — ii/i, so UyX' = — n ; y = — g, 
Xy= — izfZy so TtyXy = — iz, and so on for the other quantities in 
column (5); the sum of this column is the sum of all the deviations of 
from the arbitrary origin, and so should equal the sum of column (9). 
The terms of column (6) are the products of those in columns (i) 
and (5), and their sum is the required Sx'y\ As a check on the 
arithmetic, this may be done the other way, summing first for each 
separate array of x' and adding them. This is done in columns (i i) 
and (12), and if the arithmetic is correct, the totals of colunons 
(ii) and (3) and those of columns (6) and (12) should be equal. 
There is no independent check of columns (4) and (10), and so they 
should be repeated carefully. 

The product moment is calculated below Table 7.8 by applying 
equation (7.6) to the arbitrary values and then correcting by equa¬ 
tion (7.62). In finding r and the regressions, we have used p, 
and Gy in the natural units of centimetres, but it would be just as 
easy to maintain arbitrary units throughout and then to correct the 
regressions by multiplying by hylh^ and hjhy. The correlation 
coefficient, being a pure number, needs no such correction. 

Regression Lines and Sums of Squares: Proofs 

7 . 6 . Let X and y be the individual values of the variate with means 
X and y; then it is required to find the constants a and h of the 
following equation: 

Y = ax h, . . . . . (7.7) 

such that S[y — is a minimum. 

From (7.7) 


S{y ^Yf ^S{y-ax ^ b)\ 
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+1 226 


-}- I 226 

y^l - y = - 

955 


162 

128 

294 

540 

250 

560 

342 

316 

go 

114 

4O4 

882 
I 216 
I 650 
I 728 
I 421 
I 024 
486k 
300 
121 


— 12 
— 12 

- 32 

- 75 

- 43 

- 117 

~ 110 
—141 

“ 65 

+ 9 
141 

185 

284 

285 
283 
263 

186 
122 

60 
28 
11 


108 

96 

224 

450 

215 

46S 

330 

282 

65 

141 

370 

852 
I 140 

I 415 

1 590 
I 302 
976 
540 

280 

121 



+11 ng 


= 4 - I ‘283 77, 


= o • I cm. 


mean girth = 11-25 +0-128 = 11-378 cm 

f 12 224 

2 — - = 12 '800 o 

955 

— = — I -648 I 

>*^2 = II *151 9 
— Q‘Q83 3 

y/^i = 11 -068 6 



iTy® — Ay X 11 -068 6 = O ' I 10 69 cm.- 
<Ty = 0*332 70 cm. 


4 " 11 119 


= II* 642 9 


~ — I • 668 2 

= + 9'974 7 

P = AxAy X 9-974 7 cm.* = + 0-049 874 cm ^ 

r 4. 0-049874 _ .00 

0-332 70x0*168 54 “ +o*Ss9 














— II 


— i6 

“ 42 

— 84 

— 80 
“ 120 
—165 

-138 

-^85 

-f lOI 

196 

273 

424 

270 

240 

175 

120 

108 


4-1 241 


128 

294 

504 

400 

480 

495 

276 

85 


392 

819 
I 696 

I 350 

1 440 
I 225 
960 
972 

200 

605 


12 543 


~ 35 

- 75 

- 59 

-105 

—126 

- I ig 

- 70 

+ 25 
106 
185 

240 
414 

241 
215 

147 

III 


■bt 226 


88 

24s 

45c 

295 

420 

378 

238 

70 

106 

370 

720 
I 656 
I 205 
I 290 
I 029 
888 
819 
170 
528 



tVi = X — 


+ I 241 


— + 1*29948, hx = 0*05 cm. 


mean length = 4*125 + 0*065 = 4*190 cm. 


= 


12 543 


13*134 o 


= — I*688 6 

= 11*445 4 

— 0*083 3 

= II ‘362 I 

= hx^ X 11*362 I 

= o*i68 54 cm. 


= 0*028 405 cm.^ 


Regressions:— 

y cm X = + P 049874 — ^ I .^^6 cm./cm. 
0*028405 

, 0*040.874 , , 

xon y = + — "- -- —2 = 4- 0*450 6 cm./cm. 
0*110 69 
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and differentiating this with respect to a and b respectively, and 
equating to zero (the usual method of evaluating constants to reduce 
a function of them to a minimum), w’e have 


— 2iS3£r(j — ax — 6) = o, 

— 2S(y — ax — b) = o; 


or expanding term by term, and cancelling out common terms, 


Sxy = aSx^ -p hSx^ 1 
Sy = aSx d" ^iSi. J 


• ( 7 - 8 ) 


Now Sx = Ax, Sy = Ny and Si = N, and using these we may 
solve (7.8) to obtain 

Sxy — Nxy 

n = -- 

Sy?- - . 

y 

and 



i = y — ax. 


We have seen that Sxy — Nxy — S{x — x){y — y) and Sx^ — No^ 
= iS(x —x)^; using these in (7.81) and substituting in (7.7) we 
obtain 


y_ S{x~^x){y^ 

^ S{x-xf ^ 


• (771) 


which reduces to regression equation (7.1), when the value for r 
given in equation (7.4) and the standard deviations are substituted 
for the sums of products and squares. 

Rewriting equation (7.3) by substituting the regression value of 
equation (7.1) for Y, we have 


S(y ~-yf=S I (y — y)— r~(x — x) | + r^^—^S{x — xf. (7.9) 

I <7^ } 

Now 

g/_ Sjy — yY 

S(x — x)^ 


and the second term on the right-hand side of equation (7.9) becomes 


r^S[y — yf^ 



ters. 
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squares 







# W i ✓ --y-- — -i 

between reacdon time to sight and head length is — 0*034, with a 
standard error of 0-015. The value is a little more than twice its 
standard error, and so is suggestive of a real, although exceedingly 



took the measurements, the same one always measuring 
acters on the same individual. For the maximum and TrttnirmTm 
temperatures of Table 7 - 4 > N = i 518, r = -j- 0-30, and being more 
than ten times its standard error 


V I 518 


± O 


•026^ 





























































points which lie anywhere near a straight line does not say that there 
is no relationship but assumes one and draws his line. This is the 
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to important changes in the variance absorbed by the regression line. 


S^mficmce of D^erences between Correlations 

8 . 21 . Because its distribution is more nearly normal, Fisher recom¬ 
mends the z' transformation when testing the significance of the 
deviation of an observed correlation from zero; but the effect of the 
transformation is small for small values of r, and this step is not 
important. Differences of observed correlations from some assumed 
population value, or between pairs of observed correlations, may be 

* It may be well to mention here a fairly common fallacy in the use of the 
standard error for testing the significance of an observed coefficient, r. As 

an ^proximation, {i — r^j^/N — i is often written for the unknown 

(i — p^/vN — I and if any observed correlation is greater than twice the 
former standard error it is judged to be si^^aificantj but if the hypothesis 

being tested is that p = o, the true standard error of r is i/VN — i as 
shown in the previous section. 
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tested by using z’ and its standard error just as we did the mean 
and its standard error in large samples. 

For example, Snow (1911) gives the correlation coefficient between 
brothers and sisters, measured for a variety of characters, as 0-521, 
while Fisher (1928) found from 34 pairs of unlike sex from triplet 
births the correlation coefficient for cubit to be 0-645; can this 
difference be attributed to random errors? The values of z' are 
0-578 and 0-767, and the standard error of the second is 

, I 

~r~ —7=: = -f- 0*180 

Vil 

(we assume the first determination to be the exact population 
value, since it is based on many more observations); the difference 
is less than twice the standard error and is insignificant. 

In this example, deviations oi z' of 0-360 from the population 
value (0-578) lie on the 0-05 level of significance, giving values of 
z' on this level of 0-218 and 0-938; the corresponding values of r 
are 0-214 and 0-734. Thus the chances are about 20: i against a 
random sample of 34 pairs giving values of r outside these limits, 
but we regard values anywhere between them as common experience; 
the deviations of the limits of r from 0-521 are — o - 307 and + 0-213, 
and their inequality shows the skewness of its distribution. 

The following is an example of the use of z^ to test the significance 
of the difference between two observed correlations. Table 7.2 shows 
the correlation between number of stamens and pistils measured on 
373 flowers collected late in the season to be 0-745, while a sample 
of 268 flowers collected earlier gave a correlation coefficient of 0-506; 
making the transformation, we find z' — 0*962 and 0-557, and the 
difference, being 0-405 with a standard error of 

/ — + = o-o8o, 

V 370 265 

is quite significant. 

The Combination of Estimates of a Correlation 

8 - 22 . It sometimes happens that we have a number of correlation 
coefficients estimated from several small samples from populations 
having the same coefficient, and that we wish to combine them to 
obtain a mean. This is best done by making the transformation, 

M 
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finding the mean and then re-transforming back to r. If the 
samples are not of equal size, we naturally give the larger ones more 
W'eight than the smaller ones; but the weights should be propor¬ 
tional, not to N, the size of the sample, but to (N — 3). Thus if 
zly ^2 ‘ ^individual estimates based on samples of A\, iVg ... 
pairs of observations, the weighted mean is 

- 3)^1 -i- (-^2 — 3>2 + • • • 

-3) + i^2-3) + ■ ■ ■ 

and the standard error of is 


V(iv, - 3 )+(a^3 - 3)+..; 

Table 8.2 contains the correlation coefficients between cephalic 
index and upper face form for samples of skulls belonging to thirteen 
races (Tschepourkowsky, 1905). The values of z' are in the fourth 
column, and their mean, when weighted with {N — 3), and standard 
error are 


—- 42*408 



— 0*060 9, and 



± 0*037 9 - 


The mean z'’ is not significant. 

If an improved estimate of the correlation is obtained by com¬ 
bining a large number of very small samples in this way, there is a 
possibility of serious error in z' due to the fact that its distribution is 
not quite normal and there is a slight bias. For this reason, the 
number of coefficients combined should be small compared with 
the average size of sample. 


The Significance of Groups of Correlations 

8 . 23 . If we have a group of correlation coefficients, however, and 
wish to see if, as a whole, they show association, a test involving ^ 

and may be used. Since the standard error of z' is i/Viv - 3. 
the quantity z'y N—^ is distributed approximately normally and 

* In the language of section 3.7, Ni ~ 3, — 3, etc., are the quantities of 

information given by the samples. In finding the mean of z', these quantities 
are the weights, and smce the quantities are additive, the variance of the 
mean z' is the inverse of the total quantity of information. 
































































[•1 

M 


e seen in section o.i now me analysis oi vanance 
id the and i-tests may be used for testing the 

f a correlation coefficient; here we shall extend these 
'esting a number of hypotheses concerning regression 
id lines. The hypotheses will be given in italics at the 
actions. 

tests: That the population value of the regressim coefficient 

same as postulating a population with zero correlation, 
if section 8.i mav be used. However, we shall exoress 
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significantly different. Whether or not the means for the three ages 
are different does not here concern us, nor the possibiKty of any 
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variations in the residual variance of concrete strength from one age 
to another. 

We may eliminate the mean strengths for the three ages by 
measuring the individual strengths as deviations from those means. 
Then we may either (i) fit one regression line to all the deviations, 
which is equivalent to fitting a series of parallel lines going through 
the means for the various ages, or (2) fit separate regression lines 
for the ages. The data are plotted in Fig. ii, where the parallel 

TABLE 8.4 


Cricking Strength of Mortar and Concrete, lb. per sq. in. ^ 10 




One 

Day 

Seven Days 

- j 

Twenty-eiglit Days ‘ 

Cement 

Mortar 

X 

Concrete 

y 

Mortar 

x 

Concraie 

y 

Mortar 

X 

Concrete 
y ! 

1 

A 

263 

13S 

750 

507 

895 

679 , 


B 

493 

278 

936 

653 

I 066 

818 


C 

137 

49 

453 

425 

632 

651 

' 

D 

477 

293 

893 

662 

I ICO 

842 ' 


E 

233 

104 

545 

437 

716 

603 


F 

568 

350 

797 

735 

897 

S33 


G 

230 

I4I 

631 

439 

846 

724 

Ss(x 


164 786 

193 074 

173 057 

Ss(y 

— 5’s)^ i - ■ . 

76 259 

99933 

55482 

Ss(x 

— (y — Js) 

III 205 

117 648 

82 646 


lines of system (i) are full lines and the separate ones of system (2) 
are dotted- The dotted lines fit the data more closely th^n the full 
ones in the sense that the sum of squares of deviations of concrete 
strength is less from the dotted lines. The separate regressions are 
only significantly different, however, if this difference in sums of 
squares is real after allowing for degrees of freedom absorbed in 
fitting and for random errors. This is tested by calculating residual 
variances. If the hypothesis is correct, the lines with separate slopes 
will give the same residual variance as those with the common sIoj>e, 
within the limits of sampling errors if the second residual variance 
is significantly less than the first, the h5rpothesis is rejected and it 
is inferred that the separate lines give a closer representation of the 

* This follows from the proof by Fisher (1925) mentioned in section 7.23. 



Crushing Strength of Mortar, 1 000 lb. per sq. in. 

Fig. II 

the residual sums of squares and corresponding degrees of freedom 
after fitting the lines are given by equation (7.92) as: 

Sums of squares: Sfy—yj)- - — x^f - 

err. - [S^ix -x^)(y- y^)? . 

- six - ^ 


etc., 


etc.. 


Degrees of freedom: Ni — a, N2 — a. etc., 













from (8.42) and (8.4) and tested for significance for degrees 
freedom, Ui — u — 1, = Ni . . - — zu. 






























8-34. Two samples are from populations hamng the same regression 


coefficient. 

The above test may be reduced to the f-test when there are only 
two samples. The variance from (8.42-) is, in this instance, based 
on one degree of freedom, the square root of its ratio to the variance 

from (8.4) is 

If the residual variance estimated from (8.4) is written s^ and the 


regression coe ffi cients of the two samples are a-j^ and a^y it is easy to 
show from equations (7-S)> (^* 4 ) (^* 4 ^) 



where s is estimated from + iVs — 4 degrees of freedom.. 

This test is parallel to that given in section 5.3 for the difference 
between tv^o means. It follows from this that the standard error of 
the difference between the two regression coefficients may be derived 
from those of the separate coefficients by the standard method, using 
equation (3-1), p. 72. 

As an example we may consider the seven and twenty-eight day 
results of Table 8.4. The residual variance is obtained from 
equation (8.4) applied to the last two ages, and is 4 426 based on 
10 degrees of freedom. The standard error of the difference between 
the two regression coefficients is 

* All results for this example are in units of 10 lb. per sq. in. 























































Sheppakd’s Coreections 

8 . 4 . In calculating correlation coefficients from samples of grouped 
data, Sheppard’s corrections are often used in determining ^nrl 
cTy, and they tend to increase the value of r. For this reason it is 
better to ignore them in making tests of significance, particularly 
when the grouping is broad, although their use probably leads to an 
estimate of r which is nearer the population value. 
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sum O' 


Hence is the proportion of the total sum of squares of x associated 
with3? when a straight line is fitted, and is the proportion associated 
when a series of array means or curve is fitted; just as there are two 
regression lines (for which r happens to be the same), there are two 
series of array means which require separate correlation ratios. If 
the sample is large compared with the number of arrays and we may 
neglect the degrees of freedom absorbed, and if cr^ is the standard 
deviation of x, and that of the residual deviations, we have as an 
approximation, 

for linear regression 

for curved regression a/ = (i — 

In large samples, therefore, t] may be interpreted as a measure of 
association on practically the same scale as r. We have, however, 
made the proviso that the number of arrays shall be small compared 

















Number of ovules per ovary. 


, 3026-350 

^ — v = rzzrzz :’ v =°- 437 < ’7 = 0-66, 

5 379 'ns 

Length of cuckoos’ eggs, 

, 2356-21 , 

i_,j- = —17-= 0-313, 17 = 0-56, 

3 429-70 

and the strength of association in the former case comes somewhere 
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from normal, but that does not invalidate our test, which depends 
only on the assumption that the distribution of loaf volume within 
the arrays is practically normal. As far as can be judged from a 
sample of this size, this assumption is justified. 


PoLTOOMiAL Regression Curves 

9.3w Sometimes it is convenient to represent the regression of y on jip 
by a smooth curve described by an equation with several constants 
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mean of y. Curves of the first, second, third and higher orders imy 
be fitted successively if desired, and as more and more terms are 
used, the equation fits the observations more and more closely, until 
ultimately an equation with as many constants as there are observa¬ 
tions does so perfectly. This progressive improvement in the closeness 
of fit may be seen by finding the deviations of the observed values 
from those given by the regression and squaring and adding them 
i.e. by finding S(y — Y)\ As more and more terms are used this 
sum of squares decreases until when the fit becomes perfect, it 
becomes zero. If the constants are determined by the method of 
least squares and the fluctuations in y are purely random, the reduc¬ 
tion in sum of squares at each stage tends to be the same; it is the 
variance associated with one degree of freedom, diff ering from the 
true variance only by the extent of the random errors. If the data 
show a comparatively simple form of variation, predominating over 
the random fluctuations, the earlier terms of the equation, which 
express the simpler movements, reduce the sum of squares by an 
amoimt significantly greater than the later ones. When sufficient terms 
are included to express this slow movement of y with and the 
residual deviations are random, each of the remaining possible terms • 
reduces the remaining sum of squares by the same amount, within 
the limits of random errors. Thus, at each stage in fitting equations 
with more terms, the reduction in residual sum of squares due to 
one degree of freedom can be compared with the variance estimated 
from the residual; if the former is significantly greater than the 
latter, the last term expresses a real feature of the trend. When the 
stage is reached that the variances associated with the last one or 
two terms are not significantly greater than that estimated from the 
residual, it is presumed that the regression has been adequately 
expressed and the deviations are random. This presumption is only 
justified, however, if the polynomial is the form appropriate for 
describing the regression. This technique of fitting and testing curves 
should not be followed blindly, and a certain amount of judgment 
and reference to a diagram may be necessary; it is possible that the 
first term or two may account for very little of the variance, but 
that some of the later ones may be very important; this would te 
the case in the curve of Fig. 12, where the linear term would be 
very snaall but the second one would be important. On the other 
hand, if a very large number of terms is required, it may be because 
the polynomial form is not suitable. A regression equation fitted to 
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any data must not only satisfy these statistical tests^ but it must 
appear appropriate when plotted on a diagram. 

Every time an equation with an additional constant is tried, 
equations (9.4) have to be solved afresh and new values of all the 
constants obtained. Thus, a third order curve requires different 
values of a and b from those appropriate to the second order. The 
equation (9.3) may be expressed in a so-called orthogonal form con¬ 
sisting of a number of terms of successively higher orders in Xy each 
term being multiplied by a constant to be determined from the 
data. These terms are independent in that they may be determined 
one at a time, and the addition of the higher orders does not alter 
the constants of the lower orders previously determined. This course 
is easiest when the values of the independent variable are at equal 
intervals and there is one value of the dependent for each value of 
the independent variable, as in a time series. Fisher (1936) describes 
a convenient system for doing this, and we shall illustrate it by 
fitting a curve of the third degree to the protein content data of 
Table 8.1. 


9 . 31 « We shall call the mean protein content (the observed values) 
y, and the year measured from the middle year 1910, ^ (i.e, for 1908, 
? = — 2), and will fit a curve of the form, 


Y = a +bt + ct^ + 


to the 29 values {N = 29)."* 
This Fisher transforms to 


where 


and 


Y = A + BT, + CT^ + DT 


3 > 


= (£ — 1) = i (since t = o), 

I 

= -= - 70, 

" 12 


= ^ — 2t = fi- i 25 - 8 i. 

20 




* If there is an even number of years, t = o must still be at the centre, 
so that the values of t must be -}- 0*5, -}- i ‘5, . . , — 0*5, •— i - 5 . . .. 
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The constants are given by the relations 



B 


12 „ I ry 

SyTi =- Sytt 


N(N^ — i) 


2 030 


c = 


D = 


T / A 72 


' ( 5 - 7 ) 


2 800 


N(N- - i)(N^ - 4)(N^ - 9) 


SyTs 


30 29s 704 


125-853/^}. 


The sumniations ^3;, Syt, etc,, have to be found from the data, and 
Fisher gives a convenient method of evaluating them by a series of 
additions, but for such a small number of observations as we have, 
it is not very much trouble to obtain the summations directly, 
particularly if a calculating machine is available. For the protein 
percentages they are. 


Sji == 411*48 
= + 342*98 
= 28 408 • 36 
= + 49 329*62 


Sy^ = 240*78 
Sy2t= — 195-36 

— 17 674-16 

Sy^i^ = — 26 716-68, 


lyv aJiJ ayi • - • saTi individtial readings, 

= (29^1 iTl) • 14 +(2Syi~2yi) • t3 . +G6 Ti~143'i) • i> 

= (agJ'i+D'i) • i4®+(2a3'i+2Ji) -13®+ • - ■ +(iayi+ii)'i) ■ 

-Syit^ = (29yi-iTi) - i4®+(28yi-2ji) ■ 13®+ • • • +(isyi-iiyd • i®. 

etc. etc. 


when the values of t are S3?Tnmetrically placed about the zero; the 

labour is diminished if the terms of the brackets are found first, 

# 

giving two series of fourteen values, one for use when multiplying 
by the odd powers of and the other when multiplying by the even 
powers. 

Substituting the above values in (9.7) we obtain. 



411*48 

^9 


14*188 97, 



240*78 


8*302 76, 


29 



tlie residuals, and we may test their significance by finding 

? = iiog,( etc., 

Vo-4947 


etc., 
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totals coiiimn, aad irom iabie o,z we see mat mis gives as an 
estimate of standard error the value 


^ 0*0^2 4. 

1000 

We know this to be incorrect, since the ovaries are not independent. 
However, the 10 shrub-means are independent and may be regarded 
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Subject to the limitations of a small sample, this estimate is correct^, 
and is about seven times the incorrect estimate. 

It may be seen how this difference arises when allowance is made 
for the fact that the standard error of the mean is made up of two 
parts, one due to the true variance between shrubs, <7^^, and the 
other due to the variance within a shrub, cr,. If there are m shrubs 
and fi ovaries per shrub, the first source of variation is only repre¬ 
sented bv OT individuals and the second by mn, and therefore we have 

Standard Error of Mean , . (10.i) 

^ m wn 

and from equation (6.3), p. 129, we see that the best estimate of 
this is V vjnm. The simple theory assumes that both sources of 
variation are sampled independently mn times. If we took i 000 
ovaries one at a time from the population of shrubs, leaving it to 
chance from which shrubs they came, this condition would be 
satisfied and vre should have 

^ 4 ^ - - (10.2) 

^ mn 

If the estimates of and given in section 6.2 are substituted 
in (10.2) the result differs only slightly from the first estimate of 
the standard error given in this chapter. 

Complex samples of the kind exemplified by the i 000 ovaries, 
with 100 from each shrub, are not uncommon, and standard errors 
may easily be estimated if the sample can be divided into a number 
of equal, independent sub-samples or units like the shrubs. For 
example, the 500 pairs of brothers mentioned in section 3.1 are 
500 independent units to which the sampling theory applies. To 
find the standard error of the mean character of the sample of cotton 
hairs collected in tufts as described in section 3.1, it would be 
necessary to measure the mean character separately for each of the 
reduced tufts, say 64 in number, and then calculate the standard 
error of the grand mean of these 64 tuft means. Sometimes, the 
sub-samples themselves are not independent, but have to be grouped 
into yet larger units, and there may be a whole hierarchy of units, 
only the largest of which are independent. Thus, eight tufts of 100 
cotton hairs may be taken from each of eight bales to sample a 


Standard Error of Mea 
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and then made several counts on each boring; if there were no other 
considerationsj it would be best to take many borings and to make 
one count on each, but we must take accoimt of the fact that it may 
take more time to obtain a boring than to make a count on one. If 
there are n counts on each of m borings, ^ is the variance between 
borings in the infinite population, that of counts within a boring, 
t the time to make a count and M the time to make a boring, we have 
for the total time to obtain the N — wn counts, 


T ^ mntmkt\ . . . , 

and the square of the standard error of the grand mean, 


(10.3) 




^ ~ Cl) 

^ 3 / — “ 






(10-31) 


Now from (10.3), 

— mk 
t 


and substituting this in (10.31) we obtain 


■■ - M 

_ 2_ , 

— T 

m 




T 

- mk 

t 


In order to find the best way of distributing a constant time T 
between taking borings and making counts, we must find the value 
of m which makes a minimum. Performing this operation in the 
usual way by making 


ScTjy 

%Tn 



and substituting, we obtain 


n 


2 



. . (10.32) 


Smith and Prentice found estimates for and cr^ to be 40*5 and 
23*1 per cent, of the mean, and if we assume it takes five times as 
long to take a boring as to make a count, we find from (10.32) that 
the time is best employed when n= 1*3; i.e. under such circum¬ 
stances it would be better to increase the number of borings than to 
take more than two counts on each one. Examples of this sort arise 




ue entirely to within-slirub vanal 
variance is 3*057, so the standard error of the mean of N ovaries 
sampled in strata is 


V 


or generally, 
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D^QTQ 
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In the language of section 3.7? method of selection by strata is 

100 — per cent. 
cr/ 

more efBcient than purely random selection; for the data of Table 6.1 
this difference in efficiency is 76 per cent. 

The method of sampling by strata has a wide application since 
most populations are limited in extent, even though they may have 
so many individuals that the conception of an infinite population is 
reasonable. For example, an agricultural plot can be divided into 
areas for sampling ears of com, say; a town can be sampled in 
wards and streets in a sociological survey as suggested in section 3,1; 
or the products of a factory can be divided into strata representing 
the output of different machines, operatives or periods of time. The 
degree of superiority of sampling by strata to simple random 
sampling depends on the variance between—compared with that 
wi thin —^the strata, and there is an art in choosing the lines of 
division so as to make the ratio as large as possible. A knowledge 
of the sources of variation and of their relative importance such as 
that given by an investigation using the analysis of variance and by 
a priori technical knowledge is of great assistance. When there are 
no real variations between the strata, i.e. when = o, sampling 
by strata is at its worst, and even then it is as good as random 
sampling. It should be noted that unless there are at least two 
individuals from each stratum, o,. and hence the precision of the 
sample, cannot be estinoated. 

The Peinciples of Experimental Arrangement 

10 . 2 . All the foregoing considerations apply to the determination 
of a mean; but quite often we are interested, not in the absolute 
value of the mean, but in the extent to which the mean changes with 
some change in conditions (varied, perhaps, experimentally). From 
the point of view of such experiments, the variations are disturbing 
factors which the observer has to average out by taking large numbers 
of observations. If the variability is heterogeneous, however, it is 
often possible so to arrange the experiment that the group differences 
affect all the conditions or treatments equally, so that only the smaller 
residual variations contribute to the errors in the comparisons, giving 
an increased accuracy for the same number of observations. This is 
the principle on which experiments should be designed. 








■ 


have to bear to the number for the restricted arrar^ement the ratio 

(0-511 4)^ 

(0-055 29)^ 

or about 86: i (assuming 100 ovaries to be taken from each shrub). 

A simple instance of the advantage of arranging the experiment 
or collection of data so that large variations are eliminated is that of 
comparing the means of two parallel series of observations, and as 

an example we will compare the mean head breadths of termites from 

Actxially, Harris compared tJie ovules per loculus, and we Iiave taken the 
ovary as the unit because its distribution showed the heterogeneity of the 
variance better. However, Hams only dealt with ovaries with three loculi^ 
so that it does not affect our argument. 






























dvN __ 0*191 6 X 2*236 __ 

^ $ 0*08109 ^ ^ 

This increased value of t means that by comparing samples taken in 


there are two series of quantities, and Xg, both measured directly 
as deviations from their grand means, the sum of the squares of their 
differences is 

i5(x^ — = Sx^ + Sx^ — zSx[x^, 

When we wrote a similar equation in section 6.1, we assumed the 
variates to be independent and the product term to be zero, but if 
the variates are not independent, we see from equation (7.4), p. 165, 
that 

= r\^Sx^Sx^, 

Substituting in the above equation, dividmg by {N i) and writing 


2Ju^a 


of a difference between two independent means; but in pairs of 
parallel series like the head breadths of termites, r is positive and 
real and the standard error of the difference is thus reduced. If on 













of year effect 
nee. 

^hy we shoul) 

IS, and analys 
ests. Such a 
as Table 6.9, 
ible 10.2 for 
s random devi: 

The significance of the nest variance then becomes very much greater 
even than before, for 

^ = i [log, 0 ‘034 36 — log^ o * 002 31] = I • 35, 


^d = 4 and n^— 16. We have thus analysed the total variations 
into tiree parts, between nests, between months and a residual; 
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each of these can be given a place in a larger table of analysis with 
four rows. 

The algebraic relations may be expressed by the equation, 

(x — x) — {x^ — x) + x) ^ d, . . (104) 

where x is an individual reading for the sth row and the tth 

column,^ 

is the mean for the 5 th row, 
is the mean for the fth column, 

X is the grand mean, and 
d ~ X — + X is the residual deviation. 

Then squaring, and summing for all rows {S) and columns (j^, we 
have 

SS{x - xf = SS{x, - xf + SS{Xi — x)2 + SSiP, 

since the sums of the product terms come to zero. These are the 
sums of squares that go into the analysis table. If there are rows 
and Us columns (i.e. observations in each row and in each column, 
total N == the above equation becomes 

SS(x — xf = n^S(x^ — xf + n^Six^ — x)^ + SS^ . (10.5) 

Expressed in words, this equation states that the sum of squares 
of deviations of individual observations from the grand mean equals 
the sum of squares of deviations of row means multiplied by the 
number of readings in each row, plus the sum of squares of deviations 
of column means multiplied by the number of readings in each 
column, plus the sum of squares of residual deviations. 

These are entered in Table 10.3, and symbols are written for the 
variances. If and are the squares of standard deviations 

of row, column and residual variations in the infinite population, 
we obtain as in equations (6.3) 

Vf -MjCTj® + CT, 

^ CTy 

After the significances of the sources of variation have been estab- 
ished by the z test carried out on the values of Vj it is advisable 

* The conception is generalised by writing row for month and column 
for nest. 
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to calculate these estimates of the “corrected variances” as we 
may call them, as measures of the relative importance of the different 
sources. The values of v depend on the number of observations 
per row or column, i.e. on arbitrary features of the sample and so 
are of no value in themselves for measuring the objective properties 
of the population. If a large number of observations were taken 
individually at random from a .population comprised of many rows 
and columns, of which the few rows and columns treated in the 
analysis are a random sample, the variance between these individuals^ 
would tend to equal the value + cr/ + If any factor were 
eliminated as a source of variation, this variance would be reduced 

TABLE 10.3 
Analysis of Variance 


Source of Variations 

Sum of Squares 

Degrees of Freedom 

Variance 

Rows 

n^Six^ — xY 

fit — I 

Vs 

Columns .. 

ntS(xt “ xY 

fls — 1 

Vi 

Residual .. 

SSd^ 

s t 

N — fis — nt -j- I 

Vr 

Total .. 

SS(x - xY 

s t 

N- I 

— 


by the amount of the corresponding variance. For the head breadth 
of termites, 


£7,r 


A 




2 




* 5 ? 


o • ooz 31 (residual), 

0*023 51 ““ O'ooz 31 

- = o • 004 24 (months), 

5 

0*03436 — 0*002 31 , , 

-- o ■ 006 41 (nests); 


and the variations due to months and nests are seen to be com¬ 
paratively important. Sometimes, however, when there are many 
rows or columns, an unimportant variation may be highly significant. 
To interpret these variances in terms of normal frequencies, reference 
may be made to section 2.52. 

mat s of 0^^ aird €1^ are subject to errors of random 
sampling, but since these corrected variances are differences between 









in some balanced way as in Table 6.8 and to carry out an analysis 
of variance. This represents the type of control by selection and 

arrangement referred to in the Introduction to this book. Analyses 
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carried OHt in this connection may be very complicated, and it wilt 
be instmctiye to deal with the following example in detail. 

The data in Table 10.4 are from a paper by Gould and Hampton 
(1936).* From a single ‘‘pot/’ about eighteen cylinders of spectacle 
glass are made on any one day or “journey.^’ Two pots are heated 
together in the same furnace and glass may be made on several 
consecutive days from the same pair of pots, forming a “nm.” The 
quality measured, or variate, is the mean number of seed per unit 
area of glass, and this is given in Table 10.4 for three cylinders 
from each pot—^the third, tenth and sixteenth in order of manu¬ 
facture—and for the first five journeys in each of four nms. There 
is no significance in the numbering of the pots, and the runs are 
independent, being separated by several weeks and sometimes 
referring to different furnaces. The separate figures are given in 
the table, and various totalsf have also been computed. 

Now each kind of manufacturing unit—cylinder, pot, journey or 
run—is a potential source of variation, since the causes that control 
quality may be consistently different for each unit. Further, since 
the journeys are consecutive days and the cylinders are in order of 
* manufacture, there may be trends in quality. Clearly, the variations 
in density of seed shown in Table 10.4 are the result of the super¬ 
imposition of variations from many possible sources. Precisely what 
are these sources ? Which have statistically significant effects ? What 
is their relative importance? To answer these questions we shall 
analyse the variations in Table 10.4 completely.' 

It would be possible to do this analysis in one step writing down 
equations analogous to equations (10.4) and (10.5), but the notation 
would be complicated and the process would be very difficult to 
follow. We shall break Table 10.4 up into a number of simpler 
tables of kinds that have already been dealt with, and analyse these 
step by step. Readers are warned that they will find this analysis 
very difficult to follow unless they have “at their finger-tips” the 
simpler analyses dealt with so far. 

First let us consider the fifteen readings in one pot-run, say the 
first pot and first run. This is statistically similar to Table 6.8 

m From Table IV of their paper. We give here only four of the five runs 
given there, so as to make the analysis easier to follow. It might lead to 
some ambiguity in the argument if five runs and five journeys were retained. 

t These totals are not total frequencies as given, say, in Table 6.1. They 
are the sums of the qualities of the individuals included in the totals. 
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(a) Within Pots 


(1) Between cylinders.. 

(2) Bet^^een journeys .. 

(3) Residual within pots 

22 132*53 

37 916*66 

18 398*14 

16 

32 

64 

I 383*28** 

I 184-90** 
387-47 

(4) Total 

78 447 • 33 

112 

— 

(6) Between Pots 
(5) Between runs 

13 679*90 

3 

m 

4 S 59'96 

(6) Residual between 
pots 

10 099*57 

4 

2 524*89** 


(7) Total .. .. 23 779'47 7 3 397 * 0 ?*^ 



(ii) Total 


(d) Bettaem Journeys 


(12) Common to all runs 

(13) Common to both 

9 684*00 

4 

2 421 -oo 

pots in run, less 

18 650*07 

12 

I 554*17 


28 334-07 

16 

I 770-88** 

(14) Specific to pot 

9582-59 

16 

598*91* 

(15) Total 

37 916*66 

32 

— 








pot from the cylinder readings common to both pots. Similar 
analyses may be carried out for the four runs, and the sums of 
squares and degrees of freedom added. When this is done and the 
resulting sums of squares are multiplied by five, to make them 





































and may be divided into parts associated with runs (3 degrees), 
cylinders common to all runs (2 degrees) and a residual (6 degrees) 
representing a differential cylinder trend for each run but common 
to the two pots in the run, i.e. a cylinder-run interaction. The sums 
of squares of these items multiplied by 10 and the degrees of freedom 
are entered in rows (5), (8) and (9) of Table 10.5, 

In a precisely similar way the journey-variations of row (2), 
Table 10.5, may be further analysed, into the parts shown in 
section (d) of that table. 

Before testing the variances in Table 10.5 for significance, it will 
be well to determine of what Corrected variances they are estimates. 
Let the variances entered in the table be denoted by v with a 
subscript corresponding to the row of the table, i.e. etc., 

and let the corresponding corrected variances be denoted by with 
a corresponding subscript. Thus, cr^ is the variance that would be 
found between the cylinder means within each pot if an infinite 
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number of observations could be made on each cylinder and there 
were an infinite number of pots. Further, let the number of the 
original individuals contributing to the mean of each individual 
factor be n with an appropriate subscript. For example, % is the 
number of readings per cylinder within each pot, = 5, is the 
number of readings per cylinder mean for each run, = 10, and 
so on. Then the values of the and the relations between the 
observed and corrected variances, expressed in the manner of 
equations (10.6), are as follows: 


% = s 

+ *^3^ 

^2=3 


7t^== I 


%= 30 


^6= 25 


%= 40 

^8 + ^9 

Hg — 10 

Vq -> /ZgO-Q^ + ViQ 

fllQ ~ Til = ^ 

^10 "i" ^ 3 " 

%2 = M 

^12 ^ ^12^12^ "T ^13 

7Ii3 = 6 

%4 ^ Tt2= 2 

^13 + ^14 

^14 ^14^14^ "T 


We shall now proceed to discuss the derivation of these expressions. 

The relations between same as those between 

the variances of Table 10.3, except that we have combined the 
estimates of several tables. It will be noted that the cylinder and 
journey effects may vary from one pot or run to another and and 
of may themselves be complex. The residual variance o| is due 
partly to the fact that only a limited number of sections on each 
cylinder were examined for seed and partly to a real variation that 
could not be associated with any factor. 

The relation between ^5 and Vq follows from equation (10.6), and 
from the fact that after performing the analysis on the pot means, 
we multiplied the sums of squares by the number of individuals 
per pot. Thus, when the analysis is performed on the pot means, 
the of equation (10.6) is the number of pots per run = z; but 
we multiplied, the sums of squares by the number of readings per 
pot, = 15, so in the above equation, a| is multiplied by the product 



like that in Table 10.3. The apparent variance is contributed 
to only by the part of the cylinder trend that is specific to the pot, 
(T^j, and by the first residual, a|, and the relation given above is 
easily deduced. Similar remarks apply to the journey effect. 

We are now in a position to test the variances given in Table 10.5 
for statistical significance, using Fisher’s ^-test. If there is no 
cylinder effect, af = o and — ^^3; the test for this effect consists 
in establishing the statistical significance of the difference between 
and £73. Similarly, should be compared with with Ug, 

with ^73, ^78 with ^79 and so on. When these tests are carried out, 
the values of z corresponding to the variances marked with two 
asterisks are above the i per cent, point and that with one asterisk 
is between the 5 and i per cent, points; we shall presume that 
these show the existence of significant effects all the other values 
of z are below the 5 per cent, point. 

The variances v-^ and ^2 t>oth significantly greater than 
if this had not been so, we should still have been justified in per¬ 
forming the further analyses of sections (c) and {d) of Table 10.3. 

The variance is not significantly greater than so it is 
reasonable to combine these to give an improved estimate of the 
residual variance {V) between pots, based on 7 degrees of freedom; 
this is in row (7) of Table 10.5. Similarly, is not significantly 
greater than Vq and it is reasonable to work on the conclusion that 
the cylinder effect varies from run to run and is measured by the 
combined variance based on 8 degrees. This combined estimate 
of ^7^ is significantly greater than We see, however, that is 

♦ The one variance with one asterisk has a value of z very little below 
the I per cent, point. 
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not greater than ©3, conclude that the cylinder effect does not 
vary from pot to pot within a run. Similar conclusions may be 
reached regarding the journey variances, except that is greater 
than ®3. We are justified in combining rows (10) and (3) to obtain 
an improved estimate of % based on 72 degrees of freedom; it 
is 275-90. 

The significant effects and their corrected variances estimated 
from the apparent variances of Table 10.5 may be summarised as 
follows: 


Differences between pots, 

2 524*89 — 275*90 


01 


15 


Cylinder effect varying from run to run, 

2 583*20 — 275*90 




9 


10 


149*93 


230-73 


Journey effect, part due to variation from run to run, 


u 


2 

13 


I 770*88 —- 598*91 

6 


195*33 


Journey trend, part due to variation from pot to pot within a run, 

598*91—275-90 


a 


2 

14 


= 107*67 


Residual (unaccounted variation), 

cr|-> 275-90. 

These estimates of corrected variance measure the relative impor¬ 
tance of the several factors in the way described in section 10.3. 
The estimates are valid, however, only in so far as the third, tenth 
and sixteeenth cylinders are representative of the eighteen or so 
that may be made from a pot, and the first five journeys are repre¬ 
sentative of all the journeys possible in a run, in so far as there is 
any secular variation. For example, all the cylinders between the 
third and tenth could have fewer and those between the tenth and 
sixteenth could have more seed than the three measured, and this 
would be a source of variation entirely overlooked by our analysis, 
of which random errors take no account. We shall assume the repre¬ 
sentativeness of the cylinders and journeys. 

p 
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All the above values of the true variances are estimates subject 
to random errors, since the cylinder and journey effects vary from 
run to run or from pot to pot, and the few that have provided the 
estimates are a random sample from an infinite population of trends 
obtained from an infinite population of runs. Had there been a 
significant cylinder effect common to all runs, it would not have 
been subject to random errors in quite the same way. However 
many runs there had been, there would have been but two degrees 
of freedom for Vg ; the effect of a large number of pots is merely to 
increase Ug in the equation 

Vg ngc^ + Vg 

and reduce the error Vg, thus reducing the effect of random errors 
in Vg in estimating cjf. Similar remarks apply to the journey effect. 

We have developed the foregoing analysis by treating means —^pot 
means, cylinder means, and so on. The computation is much easier, 
however, if corresponding totals are dealt with in the way outlined 
at the end of section 10.3. The general rule is first to square and 
add the various totals and divide by the number of individuals 
forming each total, producing a series of “adjusted sums.” Then 
the sum of squares entered in Table 10.5 for any factor measured 
as a series of deviations from the means corresponding to a second 
factor is the “adjusted sum” for the first factor minus that for the 
second. As an example, let us find the sum of squares entered in 
Table 10.5, below rows (8) and (9). The first factor is represented 
by the run-cylinder totals and the second by the nm totals (i.e. 
in the analysis, run-cylinder means were measured as deviations 
from run means for all cylinders). There are 10 observations per 
run-cylinder total and 30 per run total; the adjusted sums are: 

(4462 + 6142 + 9952 + 5922 4. ^ 10 = 401 205-70 

. and (z 0552 + I 88o2 + . . . ) -^ 30 = 380 540-10, 

and the sum of squares is the difference between these, viz.: 
20W5-60. 


Arrangements for Agrecultural and Analogous Experiments 

10 . 4 « In no subject has the application of the principles we have 
just introduced for reducing the errors of comparisons of treatments 
or conditions been so complete as in that of agricultural experiment. 
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360 

324 

306 

274 

277 

2 

287 

324 

282 

252 

277 

3 

274 

302 

247 

234 

256 

4 

316 

304 

232 

261 

271 

5 

354 

364 

299 

302 

301 

6 

295 

294 

298 

313 

268 

7 

197 

201 

236 

209 

217 

8 

335 

320 

259 

205 

207 

9 

267 

262 

279 

332 

268 

lO 

291 

300 

350 

383 

264 


f 

i 


For that reason the necessity of repeating the comparisons over 
several plots so that the effect of soil variations can be assessed is 
well recognised. It is true, also, that these variations often exhibit 
trends so that if small plots are taken the average fertility of neigh¬ 
bouring ones tends to be alike; the art of planning an experiment 
ies in arranging that the varieties or conditions under investigation 
(we shall call them treatments) occur in neighbouring plots, so that 
the larger regional changes in fertility are eliminated from the 
comparison. It is desirable also that the standard error of the differ¬ 
ences due to the uneliminated variations in fertility may be estimated 
so that the significance of any difference can be tested. 

For the purposes of studying soil variations, many uniformity 
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trials have been conducted, and the data of Table 10.6, taken from a 
paper by Christidis (1931), are the results of one conducted at 
Cambridge. They are the yields of com from fifty adjacent plots of 
X 2I feet treated uniformly, and harvested separately. If we 
wished to compare five treatments (say), we could distribute them 
at random over the fifty plots. In such case the st andard error of 

the TTipan for any one treatment would be V ®,/io, where is the 
residual variance between plots after treatments have been elinoi- 
nated, and that of the difference between two means would be 
V2 times this; here the treatments are the same, and the residual 
variance for such an arrangement is the mean variance for the fifty 
yields and equals 95 94^/49 ~ ^ 95 ^* 

10 . 41 . We might expect to eliminate the variations between rows, 
however, by ensuring that every treatment occurs once in each row, 
so that only the residual within-row variance contributes to the errors 
of the differences. Labelling the treatments A, B, C, D and E, such 
an arrangement as the following could be tried, 

abode 

ABODE 

' .. ~ V 

We tave analysed the variance of the yields of Table 10.6 into 
three portions as in Table 10.3—variations between rows, between 
treatments arranged as shown and a residual within rows; the results 
are in Table 10.7. There are ten rows giving nine degrees of freedom, 
five treatments giving four degrees, fifty plots giving forty-nine 
degrees for the total and leaving thirty-six for the residual. We know, 
however, that the five treatments are all the same, and so may combine 
the treatment and residual sums of squares to give a within-row 
variance; we see that by eliminating the row variations, the 
variance is reduced from i 958 to i 218; i.e. a comparison based on 
about 12 replicates properly arranged with one of each treatment 
per row is as good as one based on about 20 perfectly random 
replicates. In practice, if the treatments differ, their significance 
is tested on the basis of the residual variance of i 063; in this example 
the treatments variance (2 616) paradoxically seems greater than the 
residual. For the difference 2 = 0-45, and being only just below the 
5 per cent, point, is suggestive (although not strongly) of a real 
effect. This is because we have neglected a fundamental principle 







within rows (i 218); the reader is recommended to try it out on 
the data. 

Other systematic arrangements have been devised to overcome 
such obvious defects as shown by the one just condemned, and of 
them the ^^chessboard’^ had been very popular. This places every 
treatment as close to every other as possible, and an arrangement 
for seven treatments in rows of five is here shown. 


A 

B 

C 

D 

E 

F 

G 

A 

B 

C 

D 

E 

F 

G 

A 

B 

C 

D 

E 

F 


It will be seen that the treatments may be assigned in the same 
order, row after row, and yet they are all represented in every column. 
If there are as many treatments as plots per row, each one may be 
two places to the right of its position in lie previous row, viz. 

A B C D E 

D E A B C 

B C D E A 

E A B C D 

C D E A B . 
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The usual method of treating the results is to divide them into groups 
containing all the treatments in the order in which they occur; in 
the seven treatments example above, the first group would contain 
the first row and the first two plots of the second, the second would 
contain the last three plots in the second row and the first four in 
the third, and so on; for five treatments, the groups would be rows. 
Then the total valance is analysed into three parts—^between groups, 
between treatments and a residual from which the standard error of 
a difference may be calculated. The danger of column variations 
giving spurious eflFects is eliminated, but such arrangements do 
produce regular, if more complex patterns, and there is a remote 
chance of the soil varying periodically and more or less in step with 
the pattern. Further, the arrangement does not make the best pos¬ 
sible comparison, for, taking the set of five treatments, the groups 
are rows, and the residual variance (which for Table 10.7 is i 218) 
includes column differences, although they do not enter into the 
comparisons. The estimates of errors in such systematic arrangements 
are thus not valid. 

Restricted Random Arrangements of Plots 
Random Groups 

10 . 42 . For large numbers of treatments it is better to have groups of 
as many plots as there are treatments, arranged in clumps rather 
than in rows, and to assign the position of each treatment within 
the group by chance. The random arrangement of five treatments in 
the rows of Table 10.6 is a special case of this, and the analysis of 
the yields is the same as in Table 10.^.^ 

The Latin Square 

The application of the Latin square arrangement to field experi¬ 
ments is due to Fisher (i936).f If there are n treatments to be 
compared, a block of plots having n rows and n columns is marked 
out, and the treatments are distributed at random, with the restric¬ 
tion that each one must occur once in each row and once in each 
column. The field of Table 10.6 has 5 X 10 plots, and so is suitable 

* Tbe results may be written in a table of rows and columns, each row 
referring to a group and each column to a treatment; then the terms of 
Table 10.3 apply directly. 

f The first edition of this book appeared in 1925. 
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for the formation of two Latin squares—we give an example below. 
The variance of such an arrangement may be fairly simply analysed 
into five parts: (i) between 2 blocks (i degree of freedom), 
(2) between 10 rows (the deviations are measured from 2 Hock 
means, giving 8 degrees of freedom), (3) between columns (we regard 
the 10 columns of the 2 blocks as contributing separately to the 
variance, giving 8 degrees of freedom), (4) between treatments 
(4 degrees of freedom) and (5) a residual with 28 degrees of freedom, 
giving a total of 49 degrees. Equation (10.4) may be extended to 


r 

I 

I 

2 

3 

4 

5 

B 

E 

A 

D 

C 

2 

D 

A 

E 

C 

B 

3 

E 

D 

C 

B 

A 

^ 1 

A 

C 

B 

E 

D 

5 

C 

B 

D 

A 

E 

6 

D 

B 

C 

A 

E 

7 

A 

C 

E 

D 

B 

8 

B 

A 

D 

E 

C 

9 

E 

D 

B 

C 

A 

10 

C 

E 

A 

B 
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give equation (10.7), where the same notation is used, the suflSxes r 
referring to blocks, r to rows, t to colunms and u to treatments. 

_ ; k ) = (x, — «) + ix, — X,) + {Xt — »,) + {x„ — xi + d. (10.7) 

The rows and columns are measured as deviations from their block 
means, the blocks and treatments from the grand mean. When this 
is squared and summed, because of the restriction that each treatment 
is represented equally in each row and so on, the product terms 
become zero, and we have equation (10.8), which is parallel to (10.5) 
and may be entered in a table of andysis of variance. 

S(x — = n^Six, — xf + nS{x, — x,f 

+ nS{xt — x^f + znS(x^ — *)* + StP, . (10.8) 

where n is the number of rows per block. 

These terms are the sums of squares of deviations of the various 
from the block or grand mean, multiplied by the number of 
observations contributing to those various means. The actual arith- 



places, the following arithmetical scheme, extended from that in 
section 10,3, is convenient: 

(i) Square every plot yield and sum, 

(4 Find the total yields for the blocks, square and sum them 
and divide by the number of plots per block, 

(3) Find the total yields for the rows, square and sum them and 

divide by the number of plots per row, 

(4) Repeat this for columns, 

{5) Repeat this for treatments, and 

(6) Square the grand total yield and divide by the total number 
of plots. 


Then the terms in the following are equivalent to the corresponding 
ones in equation (10.8), 

[(I) - (6)] = m - (6)] + [(3) - (2)] 

+ [(4) “ (^)] + [(5) (6)] + 


The data of Table 10.6, with the above arrangement of imaginary 
treatments, have been reduced in this way, and the analysis is in 
Table 10.71. The treatment variance, although greater than the 
final residual, is not significantly so (a: = 0*38 while ;2r = o-50 
lies on the 5 per cent, point), and we will combine them and regard* 
the mean variance 866 based on 32 degrees of freedom as the residual. 
The row and treatment variances are both greater than the residual 
(the 2’s lie above the 5 per cent, point), and the arrangement has 
reduced the residual variance to 866 (i.e, 9 replicates so arranged 
are as good as 20 at random). 


because Ae rows and columns are not common to both blocks, and 


their variations contribute to the block differences. 


For comparing treatment means, the standard error of each mean 

is 866/2 ?z, where zn is the number of plots per treatment. In 
making such comparisons of several means, due regard must be 
paid to the general considerations advanced in section 3.33. 

It may be mentioned in passing that Latin square arrangements 
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cards from a hat, and if this is done it is hardly likely that any 
serious lack of randomness will survive in the scheme. However. 

workers who have to form many Latin squares are advised to use 
such methods as described by Yates (1933^2). 

Use of Controls 

10.43. One common method of comparing a number of experi¬ 
mental treatments when there are disturbing variations is to repeat 
one of the treatments, a “control,’’ with each of the others, under 
conditions as similar as possible. Then, the dij0Ferences between the 
treatment and control values are measured, and these differences are 
used to compare the treatments. In order that the precision of such 
an esqjeriment may be measured, the treatment-control pairs must 
be replicated, and the relative positions of the treatment and control 
within each pair must be fixed independently and at random. 

For example, let us suppose we wish to compare five treatments 
ABODE on the plots referred to in* Table 10.6, repeating A as 
a control with each of the others in a contiguous plot in the same 
row. Then we can only use four columns and the arrangement may 
be like the following: 


Col. 1 

CoL 2 

CoL 3 

CoL 4 

B 

A 

A 

C 

A 

D 

E 

A 

A 

B 

C 

A 

D 

A 

E 

A 

etc. 

etc. 

etc. 

etc. 


. The differences between the treatments and controls may be found; 
if the variance within the treatments of these differences is v and 
the number of replicates of each treatment is the standard error 
of the mean difference between any treatment and the control is 

Vwhile that of the mean difference between any two of the 
treatments B to E is V 2vln. 

Since the data of Table 10.6 are the result of a uniformity trial, 
o may be estimated as twice the variance within the pairs, and from 
the formula at the end of section 5.1 (p. 111), we find itidt v = 737 • 6. 
The standard error of the difference between any two of the four 

treatment effects is therefore 's/1 and Sn plots have been 

used. The standard error of the difference between treatment A 



the treatment and control should be decided by chance. 

Multiple Factor Experiments 

10 . 5 * A multiple factor or factorial experiment is one in which 
several factors are varied together. For example, in an experiment 
arranged to investigate the effect of an ingredient of the size mixture 
used for protecting cotton warps in weaving, there were four treat¬ 
ments in which there were two forms of this ingredient, A and S, 
and each was used in the size in two concentrations: i and 2 units. 
The treatments were: 

(i) 1 unit of ingredient A 

(ii) 2 units of ingredient A 

(iii) I unit of ingredient B 

(iv) 2 units of ingredient B. 

Further, the experiment was done twice on two separate qualities 
of yam, X and F. Thus there were three factors: form of ingredient, 
quantity of ingredient and quality of yam. The quality measured 
on the warps was the rate at which the warp threads broke during 
weaving, the rate being expressed as the number per 10 000 picks.^ 

* One ‘‘pick” corresponds to the insertion of one weft thread * 
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Period ^ 

5 a-8{iii) i*5 (i) 1-9 (ii) 2*0 (iv) 

6 1*5 (iv) i*a (ii) 2*0 (i) 1*8 (iii) 

7 2*5 (ii) 1*4 (iv) 2*4 (iii) 2*0 (i) 

8 5'8(i) 1*9 (iii) 1*7 (iv) 3*2 (ii) 


It is possible, however, that the effect of varying the quantity of 
ingredient may be different for form A than that for form B, or 
alternatively, the difference in response of warp breakage rate to 
A and B may depend on the quantity of each present. If this is so, 
there is said to be an interaction between forin and quantity of 
ingredient. There may be interactions between any pair of the three 
factors. If such interactions exist, it may also be that the effect of 
increasing the quantity, say, from i to a units may be different 
according to whether the ingredient is A on yam A on F, 
B on X or B on Y. If such an effect exists, there is said to be a 
second-order interaction between the three factors. 
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The main effects and interactions may be measured and tested 
by an analysis of variance on the results, and with tbis in view we 
shall deal with results of the above experiments in ML 
There were four warps^ of each yam, one treatment being used 
for each warp. The Tvarps of yam X were woven in four looms 
and the total weaving time was divided into four periods. Between 
the periods, the warps were interchanged among the looms in such 


TABLE 10.9 

Analysis of Variance of Warp Breakage R atfr 


Source of Variatiou 

Sums of 
Squares 

Degrees of 
Freedom 

Variance : 

(i) Yams 

17*701 

I 

- -— 

s 

17*701 

(2) Periods 

10-348 

6 

1-725 1 

^3^ 1-^ooms .. • • •. 


6 

6*127 } 

(4) Treatments 

28*978 

6 

4*830 « 

(5) Residual 

8-539 

12 

0*712 -I 

(6) Total 

102*329 

31 

— 

Main Effects 

(7) Quantity 

5*120 

I 

5*120 

(8) Ingredient .. 

17*111 

I 

17*111 

Interactions 

(9) Quantity—Ingredient 

0*046 

I 

0*046 

(10) Quantity—Yam .. 

0*320 

I 

0-320 

(ii) Ingredient—^Yam .. 

6-302 

I 

6-302 

(12) Quantity—Ingredient—Yam 

0*079 

I 

0*079 


a way as to complete a Latin square arrangement with periods as 
rows and looms as columns. The warps of yam Y were woven in 
separate looms and periods from X, in another Latin square. The 
jarrangement and warp breakage rates are given in Table 10.8. 

First let us analyse these data as we did the crop yields in the 
double Latin square of section 10.42, except that since we are con¬ 
sidering the possibility of yam-treatment interactions we will keep 
separate the four treatments in each block, giving eight duplicated 
treatments with six degrees of freedom. The analysis is in rows 
(i) to (6) of Table 10.9. Rows (2) to (5) would be obtained if the 
two Latin squares were analysed separately and the results com- 
* A warp is a unit quantity of yam sized uniformly. 








for the four basic treatments are; 

Ingredient 





A 

B 

I unit 

m m 

m • 

32*5 

12*4 

2 units 

• m 

• • 

26*7 

14*4 


This is a 2 X 3 table of the kind analysed in section 10.3, and may 
be analysed into parts due to quantity, ingredient and the residual, 
each with one degree of freedom. Here the residual is the inter¬ 
action between quantity and ingredient. For inclusion in Table 10.9 
and comparison with the sums of squares given there, the sums of 
squares of this subsidiary analysis must be divided by 8, the number 
of readings per total. The same results would be obtained if the 
analysis were performed on the treatment means and the sums of 
squares were multiplied by eight. This is equivalent to performing 
the summations over all individual observations, and follows the 
lines of previous analyses. The results are in rows (7), (8) and (9) 
of Table 10.9. 

To obtain the quantity-yam interactions, we may eliminate the 
ingredient effects and obtain the following totals: 

Yam 





X 

Y 

1 unit 

• * 

• * 

33-7 

20-2 

2 units 

• « 

• * 

3 S ‘7 

15*4 


The analysis of these totals with the sums of squares divided by 
eight gives variances due to: yarn [row (i) of Table 10.9], quantity 
[row (7)] and the quantity-yam interaction [row (10)]. 

Similarly, the ingredient-yam interaction entered in row (ii) of 
Table 10.9 may be obtained from the following totals: 

^ 'I 

Ingredient 





A 

B 

Yam X 

• • 

• * 

39*1 

20-3 

Yam Y 

« » 

• ir 

30*1 

15*5 




s tl 
e V 


ibie 
ent: 


Irigredient A y 


2 * 5 , 


ingreatem is 


I —- - - 2 

\yam Y .. i - 9. 


For comparing the quantities, since there are 16 readings per mean, 

the standard error of the difference is ± Va X 0-712/16 = ± o - 30, 
and the corresponding standard error of a difference between ingre¬ 
dients on one yam is ± V2 X 0-712/8 = ± 0*42. The difference 
between ingredients on yam Y is not significant. 


VT ^ - y * ** 

dient A gives a considerably lower breakage rate than B and that 
on yam Y there is little or no difference in response to ihe two 
ing redients. It is interesting to note that the breakage rate is higher 
on yam X than Y, and although we have not tested to see whether 
this is due to the yams or to loom and period variations, it may be 
that the higher breakage rate is necessary to give sensitivity to the 
form of ingredient. 

The computations for this example may be readily performed by 
the method of section lo.j, squaring and sum m ing the various totals 
and dividing by the numbers of original observations in the totals. 
Before combining the results of the two Latin squares, it is well to 
perform the analyses separately to see if the two residual variances 
differ significantly. If they do, the tables should not be combined. 
We have ascertained that the two residuals are substantially the 

same, but have not given the results. 

When possible, it is advisable to have more variations of each 
factor than were used in this experiment. With so few degrees of 































view of the agriculturist is given by Sanders and Wishart (1935). 
Practically the whole of this experimental technique owes its inspira¬ 
tion to Professor R. A. Fisher, and readers are strongly recommended 
to refer to his book The Design of Experiments (1936^). There, 
Professor Fisher deals with the fundamental principles of experi¬ 
mental procedure and treats of various methods in detail. 

Of the possible methods of arrangement, the most suitable will 
depend on the particular field of inquiry, on technical considera¬ 
tions and on the nature and extent of the variations in the untreated 


subject of experiment. In this connection, statistical surveys or 
uniformity trials are very valuable. 

Results obtained from any particular experiment are subject to 
the obvious limitation that they do not necessarily apply under 
conditions other than those obtaining for the trial. For example, in 
a cereal variety trial, variety A may be significantly better than 
variety jB, but it does not follow that A will always be better than jB, 
e.g. in different localities and weathers, nor even that A will be 
better than B on the average. If it is desired to know whether A 
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gives a better yield than B on the average for all years and for 
several localities, the trial must be repeated in those localities and 
for several years, and the mean difference between the varieties be 
compared with a standard error obtained, not from a mean residual 
variance within the trials but from a residual variance between the 
trials. This point needs emphasis, because it may happen in practice 
that a highly significant result based on a single trial has only a 
narrow range of application. 

This is one reason why multiple-factor experiments such as that 
exemplified in section 10.5 are advocated. The average effect of 
increasing the amount of the ingredients investigated there is 
established not only for one ingredient, but for two ingredients 
and yams, and the applicability of the results is correspondingly 

widened. 

In all the arrangements we have considered in this chapter, the 
various factors have balanced so that as far as possible each indi¬ 
vidual of any one factor is associated once with each individual of 
another. This has made the analysis much easier than it W’ould 
otherwise have been. Equations (10.5) and (10.8) were much sim¬ 
plified because certain product terms were zero, so that the deviations 
due to each factor could easily be separated and estimated inde¬ 
pendently. This property is termed orthogonality and is highly 
desirable. 

Sometimes, however, accidents occur and individual readings are 
lost. In many such instances the missing readings may be estimated 
by the method of least squares, so that the analysis of variance is 
still possible. In some experiments, too, a complete arrangement 
involving every combination of all factors may be unwieldy and 
unnecessary. Then certain combinations may be omitted, and the 
corresponding treatments are said to be confounded. For example, 
in the arrangement of Table 10.8, the yam effect is confounded with 
ihe loom and period effects, and there is no possibility of measuring 
the interactions. This aspect is dealt with by Fisher (19366), and 
the treatment of a number of incomplete and confounded arrange¬ 
ments is worked out in a number of papers by Yates (1933 and 

1936). 

Whether the experimental arrangement is balanced or unbalanced, 
it should be carefully designed, otherwise serious inefficiency may 
result or two important effects may be confounded and so be 
unmeasurable. The time to consult statistical principles is before 

Q 












242 METHODS OF STATISTICS 

the experiment is planned, not after the results are obtained and 
are in confusion. 

It is well to empiiasise again that all the tests of significance and 
estinoates of corrected variance in this chapter are based on the 
assumption .of a homogeneous residual variation; i.e. it is assumed 
that the residual variance is, within the limits of random errors, the 
same for all blocks, rows, columns and factors that are combined. 



CHAPTER XI 


MULTIPLE AND PARTIAL CORRELATION AND 

REGRESSION • 


Pahtial Correlation 

11 .!• showed in the last chapter how the heterogeneity of 

variability has an important effect on the theory of errors, and in 
this chapter shall study its effect on correlation. 

Table ii.i shows the mean grain and straw yields for each of 

TABLE II.I 

Grain ano Straw Yields 


Treatment 


Block 1 

Grain Straw 


Blocks Bock 4 i 

Grain Straw Grain Straw Grain Straw ' 


A 

6zo 

242 

646 

321 

681 

261 

644 

317 

B 

644 

267 

74S 

383 

542 

201 

711 

316 

C 

523 

215 

713 

330 

686 

298 

688 

381 

D 

601 

212 

693 

393 

685 

265 

714 

255 

E 

664 

322 

693 

370 

666 

284 

516 

323 

F 

514 

200 

637 

361 

697 

259 

710 

361 

G 

55° 

260 

708 

318 

663 

266 

673 

340 

H 

S2I 

203 

661 

27s 

594 

207 

730 

331 


Block 

5 

Block 6 

Hock? 

Bkxsk 8 

A 

706 

255 

615 

331 

552 

216 

726 

295 

B 

705 

280 

637 

285 

543 

200 

646 

309 

C 

692 

300 

612 

294 

635 

356 

748 

284 

D 

699 

238 

697 

309 

701 

383 

746 

324 

E 

656 

232 

663 

393 

657 

351 

6S3 

363 

F 

633 

234 

595 

258 

697 

306 

712 

376 

G 

671 

362 

626 

400 

655 

276 

671 

385 

H 

625 

229 

644 

266 

745 

376 

747 

328 


64 plots (the units do not concern us here), there being eight different 
manurial treatments and eight replicates on each. The data are 
selected from a paper by Eden and Fisher (1927), and are merely 
used here for illustrative purposes; readers who are iuterested in the 
agricultural aspect must consult the original paper. Further, we 


f4m^\ 


four pairs of readings, and the crude correlation of the actual readings 
gives a coefficient of +0*524, indicating a positive relationship. 
This, however, is not the whole story; the variations are produced 
by three groups of causes, changes in soil fertility, changes in treat¬ 
ments and that complex of unknown causes which we call random, 
and it is unlikely that the relationship between grain and straw 
yields is the same for the three types of variation. Indeed, in 
a general way, plots that produce most grain might be expected 
to produce most straw; but the treatments which had an effect 


sep 


laiio: 


X i --- 

the eight pairs of treatment means (or totals). Using the same nota¬ 
tion as before, but calling the grain yield y and the straw yield x, 
the values of an individual pair of readings in the sth treatment as 
deviations from the grand means are 


and 


(* — ») = (»— X,) + — X) 

(y — y) ^ (y — y,) + (y, — y), 


. • (ll.l) 















TABLE 11.2 .^—Analysis of Variance and Co -variance of Grain aktd Straw Yields 
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aad their product summed for the sth treatment is 

S'{x — ^(y — y) = S'{x — x,){y — y,) + n,(x, — x)(y, — y) 

+ (y^ — y)S'lx — X,) + (f, — x)S'{y — 

where is the number of plots per treatment. The last two terms 
are zero since S'{x — x^) and S'(y — j/^) are the sums of deviations 
frona the mean and are zero, so this equation, when summed further 
over all treatments, gives 

S{x — (y — j) = SS'{x ~ X,) (y — y,) 

+Sn,{x, — x)(ys — • • (ii-ii) 

s 

The term on the left-hand side is the sum of products used in 
correlating the crude deviations from the grand mean, the first term 
on the right-hand side is used in correlating the deviations from the 
treatment means and the second term in correlating the treatment 
means themselves. This equation is exactly parallel to equation (6.6) 
on p. 13Z for the analysis of variance, and may similarly be entered 
in a table of analysis of co-variance. The degrees of freedom are 
reckoned up in the same way, and the sum of products divided by the 
number of degrees of freedom gives the mean product or co-variance. 
The sums of squares and variances can be entered in the same 
table, and the co-variance of any cause of variation when divided 
by the square root of the product of the variances [equation (7.4), 
p. 165] gives the corresponding correlation coefficient. 

We can deal with the co-variance in exactly the same way as 
the variance, using all the arguments and equations previously used, 
except that co-variance may sometimes be negative while variance 
must always be positive. We saw in the last chapter that variance 
may be analysed into more than two parts, and now will use the 
same methods to analyse the co-variance of the yields of Table ii.i 
into three parts. For a single plot in the sth treatment and rth block, 

[x — x) = (x^ — x) + (Xi — x) + x\ 

O' - y) = (y* — y) + — j) + y', 

where sc' and y' are residual deviations; and by a similar argument 
to that used above, summing over all treatments and blocks, 

S{x — x) O’ — j') = Sn^x, — *) 0 's — y) 

+ Sni{Xi — «) O'* — j') + Sxly\ . . (11.2) 
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where n, is the number, of plots in a treatment and the number 
in a block. This is exactly parallel to equation (10.5), p. 215,^ and the 
terms have been entered in Table 11.2 under the sums of ptoducts 
column, and when divided by the degrees of freedom, as co- 
variances. For convenience of computing, equation (7.61), p, 166, 
may be applied to find the terms of the above; i.e. 

S{x — x){y — y)^ Sxy — Nxy, 

iSs — y) = ^^s^sys — ^xy, 

and so on; but particular care must be taken of signs, as co-variance 
may be negative. The correlation coefficients in Table 11,2 have 
been found from the sums of squares and products, and whereas 
the coefficient for block and residual variations is about + 0'6, for 
the treatment variations it is — o • 3; the difference is important. 
The residual deviations are independent of block and treatment 
differences, and their correlation coefficient is a partial coefficient 
with both block and treatment factors eliminated. The eight treat¬ 
ment totals are independent of the blocks and the block totals are 
independent of the treatments, but both are to some extent affected 
by the residual deviations, and although their correlation coefficients 
are in some degree partial, only one factor has been eliminated. 
However, as there are several plots per block and treatment, the 
effects of these two factors predominate over the residual in the 
correlation coefficients. 

The maximum and minimum temperatures in Table 7.4, p. 147, 
are daily readings taken in the month of August for 49 years, and 
we may now be led to inquire if the variations between and within 
years are of the same nature. Fisher, and Hoblyn give the sums of 
squares and products, and we repeat them in Table 11.3, togetiher 
with the correlation coefficients.! The variances show that the 
between-year variations are real, and the correlation coefficients 
show that the association between maximum and minimum tem¬ 
peratures is stronger for these variations than for those within the 
year; that is to say, if we know the average maximum temperature 
for the month of August in any one year we can estimate the average 
minimum for that month with a little greater accuracy than we can 
the minimum for any one day, knowing the corresponding maximum. 

* If Us and nt are constant, they may be placed outside the summation sign. 

t The data of Table 11.3 have been calculated from Fisher and Hoblyn’s 
full correlation table and not from the condensed Table 7.4. 
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is given in the last row of Table 11.21. The variance due to treat¬ 
ments is now significantly greater than the residual, so that after 
correction for the straw yield, the treatments have a significant effect 
on grain yield. This is partly because of the considerable reduction 
in the residual variance resulting from the important partial correla¬ 
tion between the two yields. We make no attempt to give a biological 
interpretation of this result. 

If a is the regression of grain on straw yield for the residual 


TABLE 11.21 

Analysis of Vahiance of Grain Yield after Correction for Regression 

ON Straw Yield 


Source of Variations 

Sum of Squares’’ 

Degrees of 
Freedom 

Variance 

Total within blocks 

123 825*1 

55 

2 251 *4 

Residual 

89 026 ■ I 

48 

I 854*7 

Treatments (difference) 

34 799'O 

7 

4971-3 


deviations as estimated from the third row of Table 11.2, the treat¬ 
ment means corrected for variations in straw yield are 

(corrected) = — a(x^ — x). 

Here, 

If a were the true population value, the standard error of the differ¬ 
ence between any two corrected means would be calculable from 
the corrected residual variance in Table 11.21. As it is, the error 
in a also contributes somewhat to the standard errors of the corrected 
treatment means. It is for this reason that the variance due to treat¬ 
ments in Table 11.21 cannot be obtained directly from the corrected 
means. 

A discussion of the practical aspects of this application of the 
analysis of co-variance technique to agricultural experiments is given 
by Fisher (1936), Sanders and Wishart (1935). 



MULTIPLE AND PARTIAL CORRELATION 


251 


Elimination of a Linear Factor 

11 .3. In section ii.i, when finding the partial correlation between 
grain and straw yields with block and treatment effects eliminated, 
we measured the yields as deviations from discrete block and treat¬ 
ment means and correlated the deviations. Sometimes, however, the 
factor to be eliminated can be described quantitatively and its 
relation to the variates to be correlated can be expressed by regres¬ 
sion lines instead of by discrete means. Then the deviations from 
these lines may be correlated. We shall deal with partial correlation 
when the regression lines are straight. 

If the quantities are x, y and z and z is to be eliminated, let the 
values of x and y lying on the regression lines relating them to z 
be and Y^. Tlxen equation (ii.ii) becomes 

S{x - x){y — y)== S{x - X,){y - Y,) + S{X, - x){Y, - j). 

The deviations from the regression lines, may be found explicitly, 
and from them the partial correlation coefficient, or if the regressions 
are linear, the following formula may be used, 

= -- . . . (11.3) 

. J 

where ^ is partial correlation coefficient between x and y wiffi 
X elimiMted, and r^y, and Ty^ are the three total correlatiOT 
coefficients between x, y and z. This equation is proved in ffie 
appendix to this chapter. By interchanging the suffixes, the partials 
between x and z and between y and z can be found from the 
same formula. In using equation (11.3), tables by Miner (iqzz) of 
y/T_ — ys for different values of r will be found very useful. ^ 

Mumford and Young (1923) give the correlation coeffidents in 
Table 11.4, based on measurements taken on i no boys; we will 
assume the regressions to be linear. It would appear at first sight ^t 
there is a strong tendency for tall boys to have a large vital capacity 
correlation coefficient is 0-835), but the older boys tod to be taller 
and (as might be expected) to have a larger vital c^aaty. It is reason¬ 
able, therefore, to suppose that a good deal of the apparent association 
between vital capacity and height may be due to the fact that the 
taller boys are also older, and before the true connection can be tound, 
correction must be made for age. Similar argum^ts apply to the 
correlations between vital capadty and weight and between height 
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and weight, and, using equation (11.3) to find the three partial 
correlations with age eliminated, we obtain the results in the upper 
part of Table 11.41. The first three coefficients show that the rela¬ 
tionships between vital capacity, height and weight continue to exist, 
even though the age is kept constant, although they are a little less 

TABLE 11.4^ 

Correlation Coefficients 


Vital capacity and height +0*835 Vital capacity and weight +0*851 
Vital capacity and age.. +0*662 Weight and age .. +0*701 

Height and age .. +0*714 Height and weight .. +0*897 


^ We only include a portion of the data in the paper. 

Strong. We are still, however, far from a complete knowledge; part 
of the association between vital capacity and height may really be 
an indirect consequence of the fact that tall boys are also heavy, and 
weight may be the determining factor, or, conversely, height may 
have the direct and weight the indirect connection with vital capacity. 
Again we may use the formula of equation (i i .3) to eliminate further, 
and in turn, weight and height, and the results are in the lower part 
of Table 11.41. The correlations are now much reduced, but are 


TABLE 11.41 

Partial Correlation Coefficients 


Vital capacity and height (age eliminated) .. 

Vital capacity and weight (age eliminated) .. 

Height and weight (age elinoinated) .. 

+0*690 

+0*724 

+ 0*794 

Vital capacity and height (weight and age eliminated) 
Vital capacity and weight (height and age eliminated) 

.. +0*271 

+ 0-399 


still real, and it appears that the association of vital capacity with 
height is a little less important than that with weight. Thus, by 
means of equation (11.3), any number of factors can be eliminated 
one at a time, and ultimate partial correlation coefficients can be 
found. In using the formula, it is advisable to retain more significant 
figures than will be required in the final answer, as errors due to 
too drastic approximation tend to multiply themselves. Munoford 
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and Young in their paper consider other factors (stem length, diest 
girth), ^d the conclusions even of the second part of Table ir 41 
cannot be regarded as final. j /p 

4.:', 1 ft AP 

Elimination of a Non-linear Factor 

11 . 3 . If the recession between any two factors differs much from 
linearity, equation (11.3) is not valid, although it may often be used 
as a fair approximation when the departure from linearity, although 
statistically sigmficant, is not very great. It is always possible of 
course, to fit curved regression lines of x andy on z, to find the actual 
deviations of and y from those lines and to correlate the deviations 
but m such circumstances it is obviously desirable (and necessary’ 
if tests of sigmficance are to be applied) that both * and y should 
be fitted in the same way and to the same degree Anv method of 
smoothing may be used to detemhne the regreSon 
such form as the polynomial, which can be found by the method of 
least squares, is preferable so that the number of degrees of freedom 
absorbed in fitting may be known. If the polynomial is used (say) to 
the third degree in z, and x and y are the variables which are to be 
partially correlated, it is permissible to regard x, y, z, z^ and ^ as 
five variables related linearly (finding the individual’ values of 
^nd z from those of a^), to correlate them in every possible way, 
and then to eliminate a«, z^ and ar in turn by repeated applicatiiHis 
of equation (11.3) as shown in the last section. If there is only one 
value of the dependent variable to every one of that which is to 
be eliminated, and a polynomial can be fitted according to FMkt’s 
system as outlined in section 9.31, the operation of correlating 
residuals is very easy; we deal with it in the next section. 


11 . 31 . We found constants A, B, C, etc., of the polynomial [equatkn 
( 9 -^)> P' ^97] ^nd said that the sums of squares of the residuals were 
foimd by subtracting in turn from the crude sum of squares of ffie 
observations, quantities NA^, etc. (section 9.31); similarly the g»mg 
of products of the residuals after fitting any number of terms of the 
polynomial are obtained by subtracting successively from the crude 
sum of products the quantities. 


IZ * “ 180 

iV(iV^ - I) (AT^ - 4 ) (iW - 9 ) ^ n ... 

2 800 


(”• 4 ) 
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(paying due regard to sign) where A^, Bj, etc., are the constants of 
the polynomial of the first factor on the eliminated one and Aq, Bg* 
etc,, are the constants of the polynomial relating the second factor 
to the eliminated one. If a curve of zero order is fitted, i.e. if the 
deviations are measured from the mean, the sum of products of 
residuals is S{o(y) — NA1A2, and this is the ordinary method of 
correcting to the mean by subtracting Nxy; the subtraction of the 
other terms for curves of higher order may be regarded as equivalent 
to correcting to a moving mean. These product terms are just as 
much part of the analysis of co-variance as the square terms are of 
the analysis of variance, and we have inserted them in Table 11.5 
for the two series of protein percentages, the original data being 
in Table 8.1, p. 174, and the constants being given on p. 199; 
Table 11.5 corresponds exactly to Table 9.4 (p. 200), and the two 
should be read in conjunction. It would be a good and convincing 
exercise if the reader were to determine the 58 protein contents 
given by the two cubic regressions, find the actual residuals, and 
see if their sums of squares and products agree with those in Tables 
9.4 and 11.5. As before, the sum of products at any row may be 
used with the corresponding sums of squares to determine a partial 
correlation coefficient. We will now use these data to illustrate another 
type of statistical problem. 


Spurious Correlation of Time Series 

11 A An investigator may ask if there is any common factor (perhaps 
weather) which affects the protein contents of both series of com in 
the same way, and it is natural to use the correlation coefficient to 
examine this question. The cmde correlation coefficient as found 
from the “totals” rows of Tables 9.4 and 11.5 is — 0-638, and 
would at first sight appear to indicate that if the common factor 
affects protein content, it acts on the two series in opposite ways. 
We have to remember, however, that because of the way in which 
seed was selected for the two series, one increased in protein content 
while the other decreased, and this predominating effect gives a 
negative correlation coefficient. In order to investigate the effect of 
the possible common factor, we must first eliminate the variation in 
protein with time (due to seed selection), and then correlate the 
residuals. First, we will assume as an approximation that the regres¬ 
sions on time are linear, with correlation coefficients of -f 0 -862 
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and — 0*708, and using equation (11.3) (time is z and the protein 
contents are a; and y\ find the partial correlation to be 

— 0-638 4 “ 0-862 X 0-708 

/ - - - ~ = — 0*08, 

y {1 — 0*862^) (i — 0*708^) 

which, on such a small sample, is not significant. We have seen, 
however, that the regression of the two protein contents on time 
departs considerably from linearity, and equation (i 1.3) is useful only 
as an approximation; but we may use the analyses of Tables 94 


TABLE 11.5 

Analysis of Co-variance of Protein Content 


Source of VariatioE 

Degrees of Freedom 

■V 

Srans of Products ; 

First order regression 

• • 

I 

~ 33‘oi ; 

Second order regression 

* • 

I ' 

~ 2-86 

Third order regression 

• » 

. I 

00 

I 

Residual 

• • 

25 

+ 3-52 

Total 


28 

— 34-53 


and 11.5 to correlate the residuals after fitting cubic equations. This 
correlation coefficient is 


+ VS^ 

'S/ 12-34 X i2-o6 


— 0*29, 


and although still too small to be significant on sudi a smalt sample, 
does suggest that some common factor varying from year to ye^ 
may have been affecting the annual residual fluctuations of protein 

yield in the two series in the same way.^ 

Such problems, in which it is desired to correlate two series of 
observations extended in time or space, have received a good deal cff 
attention from statisticians; the example in section 7.3 of the cmrrela- 

* We have treated the jBrst observaflon in each of the two serfes as inde¬ 
pendent, but they are actually the same protein content of a OTimma 
This dc^s not affect the constants muci^ nor, in this instance, cwr a»- 

clusions. 
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tion between age at death and the proportion of marriages contracted 
in the Church of England comes into this category. Essentially the 
problem is a special case of partial correlation, with time (or space) 
as the variable to be eliminated, and the discussion has centred 
rotind the development of suitable means of expressing the regres¬ 
sion and obtaining residuals; there is most difficulty in economic 
statistics where factors are apt to vary periodically (or in waves), 
but for most biological data extending over only a few years, the 
method we have outlined is adequate. Even if no such obvious effect 
as that in the example just worked is present, the possibility of some 
unsuspected time-effect in data collected over some period of time 
should always be considered. 


Partial and Multiple Regression 

11 . 5 . With every partial correlation coefficient there goes a partial 
regression coefficient^ which is obtained by dividing the sum of 
products by the appropriate sum of squares of the independent 
variable and which expresses the amount of change in one factor 
associated with .a second when the other factors are constant. The 
regression of grain on straw yield (Table 11.2) for the residual 
variations is the value of a on p. 25c, viz. 

58 549-0 


71 496 • I 


= 0-82 


unit of grain associated with a change of i unit of straw per plot. 

When there are several quantities all linearly related, partial 
regressions are best found by fitting a multiple regression equation. 
l£y is the dependent variable and X2, x^ . . . are the independent 
variables, the linear regression of y on the x^s is expressed by an 
equation of the form, 


Y' = ax[ + + cx'^ + , . . (ii-S) 


Equation (11.5) is called the multiple regression formula of 3; on x^, x^, 
x^y etc., and the constants a, c, etc., are the partial regression 
coefficients, showing how much unit changes in the individual x 
variables affect 3;, independently and directly. The constants, <2, J, c, 

* x' and y' are deviations from the grand means. 
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It 

etc., may be found by the method of least squares, and leads to 
a series of linear equations, 

Syx[ = aSx^ + hSx[x^ + + . . . | 

Sy'x^ = aSx[x^ + bSx^ + cSx^'^ + . . . . 

Sy'x^ = aSx^'^ + hSx^x^ + cSx^ + . . . ^ 


The summations have to be found, of course, from the data, and the 
equations have to be solved for a, i, etc. These are not unlike 
equations (9.4) given on p. 195 for determining the constants of a 
curved regression line; and indeed the two processes are somewhat 
the same, for in the earlier one we are finding the multiple regre^ion 
of y on Xy x^y x^, etc. 

We shall illustrate this by finding from the data of Table 11,6 
(Harris, 1931) a formula for predicting the loaf volume from the 
protein content of the flour and the percentage of it extracted by 
potassium bromide solution. The forty-four wheats were analped 
and subjected to the baking test to give the data of the table. If we 
call the loaf volume y, and the protein percentages % and x^^ tl^ 
sums of squares and products are as given in the following equations, 

I 462-33 = 107*55 a — 161-646 

— 2022-85 = — 161*64^ + 473-196. 

These, solved for a and 6, lead to the regression equation. 


baf volume in cx, == 14-738 5 X crude potein per cent. 

+ 0*759 7 ^ potein extracted by KJBr per cent. 

all quantities being measured as deviations from their mons. 
If we were to use the crude protein percentage only, a would tten 
the ordinary linear regression, and equal to 


1 46^*33 

107-55 


13-6 c.c., 


for a change of i per cent, in protein. 


The Multiple Correlation Coefficient 

11.6. Just as we need the correlation coefficient to give m idea ol 
the accuracy of a linear regression formula with a single term, ao we 
need a constant to specify the accuracy of a multiple r^iresaon 
formula, and this is provided by the muliiple correlatum coeffidmt. 

B 

#1 
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This may be foimd by correlating the predicted values (T) given by 
the formula with the actual ones (j); or alternatively, the sum of 
squares of the deviations of y from the regression is (i — multi¬ 
plied by the sum of squares of the deviations of y from the grand 
mean, where R is the multiple correlation coefficient. Thus, multiple 
correlation is another case of the analysis of variance, analysing 

TABLE 11.6 


>- 

i 

■ Loaf V^olume , 
(c.c.) , 1 

\ i 

Crude 
Protein, 
Per cent. 

Protein 
Extracted by 
KBr, per cent.* 

Loaf V^olume 
(c.c.) 

Crude 
Protein, 
-Per cent. 

Protein 
Extracted by 
KBr, percent. 

4 

540 

12*0 

31*2 

472 

12*6 

21*8 , 

475 ' 

11*5 

31*4 

500 

10*4 

31-6 

450 

11 -o 

31-3 

520 

13*8 

28-8 

455 

12*6 

27*9 

525 

13-6 

26*7 

465 - , 

11*9 

31 4 

490 

11*5 

30*8 

470 

12*0 

" 30*2 

505 

10 6 

31-5 

440 

11*7 

31*9 

475 

•10*7 

315 


12 * I 

31-8 

545 . 

14*2 

28*8 

475 

12*9 

31 -o 

534 

134 

27-4 

46s 

11*3 

31-7 

490 

11*0 

29*3 

520 

14*7 

28*3 

507 

10-8 

294 

525 

15*2 

20*5 

530 

12*8 

28*2 

r 450 

10*5 

28*7 

505 

12*2 

29*7 

500 

13*1 

27*7 

495 

12*5 

28*9 

445 

lo-i 

31*8 

515 

12*4 

30-2 

443 

11*2 

28*9 

505 

14*7 

21 *4 

463 

11*3 

25 '5 

535 

I 4'5 

22-7 

446 

11*5 

30*0 

445 

9*0 

34*9 

) 500 

J 2'9 

28*5 

470 

11*2 

29-7 

1 487 

10*4 

2 S -5 

435 

9*1 

34*0 

: 502 

12*7 

25*5 

475 

10*8 

29 * 2 

: 529 

16*0 

21 *-8 

505 

13*9 

26*0 , 


* This is the percentage of the crude protein extracted by KBr. 


that of y into two parts, one associated with the regression and the 
other a residual, and the degrees of freedom of the former are equal 
to the number of terms in the regression equation. We have entered 
these terms in Table 11.7, assuming the size of sample to be iV and 
the number of terms [jc’s in equation (11.5)] to be m\ If there is 
only one term on the right-hand side of the regression equation, 
m' 5= I, and Table 11.7 reduces to the form for the correlation of 



TABLE 11.7 
Analysis of Variance 
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two variables (Table 7.6, p. 158), where R = r. On the other hand, 
if we compare Tables 11.7 and 9.S (p. 193) we see that they are oi 
essentially the same form, ^ corresponding to and mf to (m — i), 
N being the same; thus, the multiple correlation coefficient and the 
correlation ratio are essentially analogous constants, both measuring 
association, the former between one variate and a number of others 
as expressed by a multiple linear regression formula, and the latter 
between one variate and another expressed as a series of array means 
or as a curved regression line. We have already pointed out the 
analogy between the multiple regression formula and the curved 
regression line, and now see that it is complete. Like the correlation 
ratio, the multiple correlation coefficient is essentially positive, and 
becomes larger as the number of degrees of freedom associated with 
it (m') increases; indeed, if there were as many constants as there 
are independent deviations of j in the sample (m' = iV — i), R 
would equal unity. Hence, all that we have written about the corre¬ 
lation ratio is true of the multiple correlation coefficient. 

In order to find i?, it is not necessary to find the individual regres¬ 
sion values of y, but the following relation may be used 

Sy'2 _ aSy'x[ + bSy'x'^ + + . . .* . (11,7) 

For our wheat quality and loaf volume data, 

SY ^2 ^ R 2 Sy '2 = 14.738 5 X I 462-33 

— 0-7597 2022-85 — 20015*8, 

* Squaring equation (11.5), 

= ax[[ax[ + bx^ + cx^ + . . .] 

+ bx^lax^ + bx^ + 

+ cx^[ax[ + bx^ + cx^ + . . .] 

+ . 

= a[axi^ + bxiX2 + cx^Xq + . . .] 

+ b[axiX2 + bxQ^ + cx^Q + • • •] 

+ + bx^X^ 4 * CXq^ + . . .] 

+ . 

and summing, 

SY^ = alaSx-]^ + bSx^x^ 4 cSx^x^ 4 " • • •] 

4 “ bljciSxjX^ 4 " bSx^ 4 “ cSx^x^ + , • ,] 

+ c\aSx*-^^^ + bSx^'^ 4 cSx'^ 4 ■ • J 

Equation (i 1.7) follows from this when the values of equation (11.6) arc 
substituted for the terms in the brackets. 
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and since 

5 y 2 = 41246*8, i?^ = o*485 3 and J? = o-697, 

Correlating the loaf volume with the erode protein content only, 
we find r = + 0*694, thus see that although it al:m>rl>s one 
more degree of freedom, the multiple regression equation is «:ar(xly 
better than that using protein content as a single factor. We deal 
with tests for the significance of such a difference in section ii.'jz. 

Errors of Sampling in Multiple and Partial Correlation 
Partial Correlation Coefficient 

11 . 71 . Fisher (1924c) has shown that the distributions of the partial 
and the total correlation coefiScients are the same, when they are 
estimated from deviations having the same number of degrees of 
freedom. Thus, if there are N pairs of observations and m factors 
have been eliminated (leaving N — m — i degrees of fr^dom), 
the partial correlation coefficient is distnbuted like that of the total 
coefficient based on {N — tw) pairs of observations. 

The residual correlation coefficient between grain and straw yield 
in Table 11.2 is based on 49 degrees of freedom and may be treated 
like an ordinary correlation based on 50 pairs of observations, while 
the correlation for treatment variations is based on 7 d^ees of 
freedom, and may be regarded as arising from a sample of 8; we 
will use Fisher's ^ transformation (section 8.2) to test the signifi¬ 
cance of the difference between them. The values of / are 0-681 
and — 0*347 with a difference of 1*028; the standard error of this is 

± Vs + = ± 0*47, 

and the difference is significant. 

We found in section 114 that the partial correlation o^ffident 
between the residuals of protein contents of two series of enm after 
the time trend had been eliminated by a cubic was + 0*29, ^d 
since there were 29 observations with four terms eliminated (in¬ 
cluding the mean) we must test this as though it were based cm 
26 pairs of observations. Using Fisher's table of values of the corre¬ 
lation coefficient at different levels, we should enter it at if — 24; 
there is no value at Ihis point, but we see that for b = 25, r = 0*32 
lies on the 0 *1 level, and our observed value, being le^ even than 

this, is^ not significant. 







262 


METHODS OF STATISTICS 


For the same reasons as advanced in section 10.i in connection 
with the mean, the sampling error of the correlation coefficient is 
only given by the usual formulae when the correlation is between 
homogeneous and independent deviations arrived at after a complete 
analysis of co-variance. 


Multiple Correlation Coefficient 

11 . 72 . In order to test the significance of a multiple correlation 
coefficient we may use the analogy with non-linear regression and 
employ the same test. From the analysis of variance of Table 11.7 
we find 

and enter Fisher’s tables at % = m' and 712 = {N — m' — i). 
Wishart (1928) gives a table of values of the multiple correlation 
coefficient lying on the 0*05 and o-oi levels of significance. 

For the data of Table 11.6 (loaf volumes and wheat qualities), 
^ 0-485, = 2, iV= 44, 


log. 


0-485 

0-515 



1-48, 


and this is even greater than the value of z lying on the i per cent, 
level when % = 2 and TZg = 30 (i*e. greater than 0-84). Hence this 
multiple correlation coefficient is significantly greater than zero. 

We may also wish to test if a regression formula with a large 
number of constants gives a significantly closer prediction than one 
using fewer, and this may be done in exactly die same way as in 
section 9.2, where we tested for non-linearity of regression. If there 
are two regression formulae involving and constants, being 
greater than and the multiple correlation coefficients are Rj and 
i?2, the complete analysis of variance is as shown in Table 11.71, 
and since all the sums of squares are positive, Rj^ must always be 
greater than R^. If, however, the difference between Rj and i?2 is 
the result of a real effect, the variance for the difference between 
regressions will be significantly greater than the residual, and using 
Fisher’s test, we see if 




(N 




r) 


Rf) {m[ - m'^ 
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is significant for degrees of freedom, 

= and = N — — i. 

Applying this to the data of Table 11.6, for protein content 
alone = 0*482 (w'= i), while for the multiple regression, 
= 0-485 {m[ == 2); hence 


2 = log. 


0-003 X 41 

0-515 



and certainly is not significant. Thus, the extraction of some of the 
protein by potassium bromide has added nothing to the informaticm 
provided by the crude protein content alone, as far as the prediction 
of loaf volume for this series of flours is concerned. 


11 » 7 S. We will illustrate the further application of these tests by 
another example, which presents several features; the data are in 
Table 11.8. Fifty groups of varying numbers of cotton hairs were 
weighed and measured, and the weights of the groups were express! 
as multiples of io”®grammes per centimetre.Thesewere then swollen 
in caustic soda and placed in three classes. A, B and C, according to 
their appearance under the microscope. The problem is to determine 
if the three classes show real differences in average hair-weight 
centimetre. A regression formula of the form of equation (11-5) 
may be obtained to express the weight of a group of hairs in tenm 
of the number in each of the three classes; if we put 3? equal to the 
group weight, and and 51^ the numbers in the three 
a, h and c are the corresponding mean hair-weights per cmtnnetie. 
Similarly a second regression may be obtained in which only the 
average hair-weight' for all classes, and the total numter of han^ 
in the group are used; this is the ordinary case with one amstant. 
Now the former regression, absorbing three degrees of freedom, 
will necessarily have associated with it a little more of the variancx 
in weights of the groups than the second, but if the three grou|^ 
really have different-hair weights, the equation wiffi three constants 
win fit the data very much better than that with only cme, and the 
difference in the associated variances will be sigtdfiamtly ^eafcer 
than zero. We may therefore investigate the problem by finding the 
regressions and variances, and testing the sigmficance of the differ¬ 
ence in the latter. 



TABLE 11.8.*“— Numbers of Cotton Hairs and Weights (in io“® grammes) 
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To find <2, b and c we solve equations (11.6), and fca* we 
need to determine the sums of squares and products from tltt data. 
It is not fair, however, to give all groups, whether large m ami, 
the same weight in finding these sums. If the numter of hairs 
in a group is w, and the variability of weights of single hairs is 
approximately the same in all classes, with a standard deviation td 
or, the standard error of the mean weight per hair for the gnmp is 

cr/Vw, and that of the total weight is n times this, and m h profKM'- 

tional to ^/n\ hence the variance of the total weight of a group di« 
to variations between hairs within the same class is propcrticnal to 
fly the number in the group. We may use the results of secti«i 3.7 
and regard i/n as the quantity of information; then in imkiag 
summations, each term may be weighted with this quantity, multi¬ 
plying each product and square by ijn before summing. For example, 
the weighted sum, 



3243 X 15-4 
23 


5 X 25-0 

-r 

30 



A further modification is necessary; a group with no hairs naturaly 
has zero weight, so we shaU make our regression line pass throi^h 
the origin and not through the mean, and shaU measure al deviaticMis 
from zero; that is, in equations (11.5) and (11.6) we write x aEui y 
instead of x' and y\ It may assist some to think of the data c£ 
Table 11.8 as a h^ of the complete data, the odier half having 
equal negative values, so that tib.e mean is zero, and the sums erf 
squares and products we obtain are half the total fm* tije hn^inary 
complete results. 

The appropriate sums of squares and products are set out in equa¬ 
tions below, and their solution yields the r^^ion equation folowicg 
them. 

363 785-7 = I 468-85^ + 288-836 + i8'42c 

73 472*3 ~ 288-83^2-!- 74*73^ 4*94^ 

4646*0= i8-42c2“i" 4 * 94 ^ i-o4i^ 

7=226-2557%+ 114-127% —82-16% 

Although the data are to some extent approximate, the arithmetic 
must be performed with considerable premsicn if the fin a l results 
are to have any accuracy at all. In fi n di n g Sx^fn we calculated 
x-Jfi to five decimal places, and then summed yx^xju on a machin e; 
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as a check, we then calculated and summed Xy/n. In performing 
the divisions for the solution of the simultaneous equations we had 
to work to nine or ten significant figures to obtain the accuracy of 
the constants shown in the regression. It is not claimed that the four 
decimal places for a have any physical meaning, but as a constant 
of the sample which will be used to find the sum of squares of group 
weights, that accuracy is necessary. Closes A and B have differing 
mean hair-weights per centimetre, while the hairs of C appear to 
have a negative one; however, there were very few of class C, and 
this irrational result is due to sampling variations, and signifies 
nothing. To find the sum of squares associated with the regression 
we use equation (11.7), and find 

226-2557 X 363785-7+ 114-127 X 73472*3 — 82-16 

X 4646*0 == 90 312 000. 

This could not have been given correct even to five figures if we 
had determined the regression coefficients correct only (say) to 
three. The weighted sum of squares of the hair-weights is 


3 ^ 43 ^ 

23 




+ . . . = 91 220 148, 


and we will use this to five significant figures only. 

We now have to find a regression equation using one constant, 
the mean hair-weight per centimetre for all classes. Using only the 
first term and equation of (11.6), and finding the weighted sums of 
products and squares, we have 



n 



j 


since there is only one class, = w, and we obtain Sy = a'Sn^ 
which is the straightforward relationship, mean hair-weight = total 
weight of all groups divided hy the total number of hairs. This gives 
d = 203-736, and the sum of squares of weights associated with 
it is 203-736 X 441 904= 90032000 (correct to five significant 
figures). 

The analysis of variance is in Table 11.9, and in reckoning the 
degrees of freedom it must be remembered that deviations are not 
measured from a mean determined from the data, but from the 
origin, so that the fifty groups contribute fifty degrees to the sum of 



squares. In order to test the significance of the difference in 
sions, we find 



and this is significant, for when % = 2 and % = 30, z = 0-84 
lies on the i per cent, level. Thus we conclude that the two da^es of 
hairs A and B, as determined by swelling in caustic soda, have sig¬ 
nificantly different mean hair-weights per centimetre. This r^ult 
could not have been obtained directly, because there were no means 
of weighing the hairs individually, and it was not practicable to 
obtain the individual classes for separate weighing. 


TABLE 11.9 

Analysis of Variance of Weights of Groups of Cotton Haik 


Source of Variation 


Sums of Squares 



Multiple regression— 

Simple regression 90 032 000 
Difference in re¬ 
gression . • 280 000 

Residual .. . * 9^8 000 


I I - 90 032 0 <Xi ; 

90312000 M > ! 

z J 140 000 ,'i 

47 ' 19 300 



11.8. We miist emphasise the imderlying assun^rtian cf lie metiiods 
of this chapter that the residuals are homogeneous in the seme tto 
their variance and co-variance are constant. In the exampte o£ 
section ii.i, for instance, we assume die relatkai betvreen residtm 
straw and grain yields is sensibly the same for aU treatments ^ 
blocks, and in the example of section ii-S illustrating mntople 
regression, we are assuming that the relation betwem loaf 
and protein extracted is the same for all values of CTude proton. 
tests of significance also assume the residual deviaticMis to be nOTimlly 

distributed. 
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APPENDIX 


Proof of the Formula for the Partial Correlation 

Coefficient 


Let X and y be two variables, and let z be the third which is to be 
eliminated. Then the regressions of * and y on » (measuring all as 
deviations from the mean) are 


and 



and the partial correlation coefficient between x and y is 

- X’){y' - jn 


%y,% 


VS{x' - XJS(y' - Y'f 




Ea^ianding the numerator of (11.9) 


S{x' - ZOCy' - F) = Sx'y' + SX'Y' - Sx'Y' - SX'y', 
and substituting from (11.8) for X' and P', 


Six' - X)iy' - T) 


= Sx'y’ + Sx'^Sy'^ — 




■. ~ '""yy 

] Sz'^ 

( 

= Sx'y' — Vy,V Sx/^Sy'^, 


= V Sx'^Sy'\r^ — . . 

• . • (”-90 


Eiq>anding the first term in the denominator of (11.9)) 
5 (*' - Xf = Sx'^ + SX'^ - zSx'X’, 



REFERENCES 


Balls, “W. L,, and Hancock, H. A. (1926). Measaremoits of ite leversiiig 
spiral in cotton hairs. Froc, Roy. Soc., B, xcix, 130. 

Bateson, W. (1913)- Mendel’s Principles of Heredity. Cambrid^ Unii?«Mty 
Press. 345. 

Bowley, a. L. (1926). El^nents of Statistics. 5th edition. King. 

Chbistidis, B. G, (1931). The importance of the shape of pkrts in Md 
experimentation. Jour. Sd., xxi, 14. 

Cochran, W. G. (1936). Statistical analysis of field coiinte diseami 
plants. Supp. Jour. Roy. Stat. Soc.^ iii, 49. 

Collins, G. N., Flint, L. H., and McLane, J. W. (19Z9). Efectxic stinHifa- 
tion of plant growth. Jour. Agri. Reseorck, sxxv^, 585. 

Co-OPBRATTVE STUDY (1917). On the distribution of the correlation cx^dfident 
in small samples. Biometrika,, xi, 328." 

— (1923). On the nest and egg& of the Common Tern. Brnmir^m^ xf, 294. 

CoRKTLL, B. (1930). The influaice of insulin on tiie distiitotMMi of ^ycc^m 
in normal animals. Biochem. Jour., xxiv, 779. 

'Dabbishibe, a. D. (1904). On the result of crossing J^janese Waiting wilh 
Albino mice. Biometrtka, iii, i. 

Eden, T., and Fisher, R. A. (1927). The experimental determinaticMi c£ 
value of top dressings with cereals. Jour. Agri. Sd., xvii, 548. 

Fisher, R. A. (1915). Frequency distribution of tiie values dc the corrdatkm 
coefl&cient in samples from an indefinitely large population. BmmeirSka ^ 


X, 5 o 7 ‘ 

•— (1921). On the “probable error” of a coeffidaat of correiatkm deduct 
from a small sample. Meiron, i, No. 4. 

— (1922). On the mathematical foundations of statistics. PML Ttoms. Rt^. 

Soc.y A, ccxxii, 309. 

— (1922&). On the interpretation of from ccmtingMicy talAs, and the 

calculation of P. Jcmr^. Roy. Stat. Soc., Ixxxv, 87. 

— (1924^3:). On a distribution yielding error fimc^ons of sevOTl well 

known statistics. Proc. of the lutenuiiional MaA Ckmgress^ 805. 

— (1924c). The distribution of the partial corrdation cscMfl&aeat. 

. iii. No. 3. 

— (1925a), Theory of statistical estimation. Proc. Cau^. I^M. Soc., xsii, 7i»w 

— (19255). Applications of “Student’s” distributicm. Mf^ron^ v. No- 3. 

— (1928). Triplet children in Great Britain and Irdand. Prcu:. Rj^. 




B, di, 286. 

— (i93o<3:). The moments of the distributicHi for normal sanrpfes 

of departure from normality. Proc. Roy. S^., A, cxx^ 16. 

— (19305). Inverse probability. Proc. Gamh. PML Boc., xxvi, 5®^^ 

— (1935). The logic of inductive inference. Jour. Ex)y. Stat. Sm.^ xcvm, 39. 

— (1936 or 193^)* Statistical Bdetiiods for Re^arch "Wcukers. 6th Edition. 

Oliver & Boyd. ^ 

— (19365). The Design of Experim^ts. arid Editbn. Olwer & Boyd. 


276 METHODS OF STATISTICS 

Fisher, R. A., and Horlto, T. N. (1938). Maximum- and minimum- 
correlation tables in comparative climatology. Geograjiska Atmaler, 
iii, 367. 

—, Thornton, H. G., and Mackenzie, W. A. (1922). The accuracy of the 
plating mediod of estimating the density of bacterial populations. Armcds 
of AppL Biology, ix, 325. 

Glanviixe, W. H., and Reto, D. A. G. (1934)- Mortar tests as a guide to 
the strength of concrete. Structural Engineer, xii, 242. 

Gould, C. E., and Hampton, W. M. (1936)- Statistical methods applied to 
the manufacture of spectacle glasses. Supplement Jom. Roy. Stat. Soc., 
Hi, 137. 

Harmon, G. E. (1926). On the degree of relationship between head measure¬ 
ments and reaction time to sight and sound. Biomeirika, xviii, 207. 

Harris, J. A. (1910). On the selective elimination occurring during the 
development of the fruits of Staphylea. Biometrika, vii, 452. 

Harris, R. H. (1931). Relation of peptization of wheat flour protein to loaf 
volume. Cereal Chemistryy viii, 47. 

iRwm, J, O. (1931). Mathematical theorems involved in the analysis of 
variance, jfour, Roy. Stat. Soc., xciv, 284. 

Jones, H. G. (1910) (data for a note by K. P.). On the value of the teachers’ 
opinion of the general intelligence of school children. Biometrika, 
vH, 542. 

Koshal, R. S., and Turner, A. J. (1930)- Studies in the sampHng of cotton 
for the determination of fibre-properties. Jour, Textile Inst., xxi, T325. 

Latter, O. H. (1902). The egg of Cuculm canorus. Biotnetrikay i, 164. 

Mahalanobis, P. C. (1932). Auxiliary tables for Fisher’s 2r-test in analysis 
of variance. Indian Jottr, of Agrt. Sci,y H, 679* 

— (1933)- Tabl^ for the application of -L-tests. Indian four, of StatisticSy 

i, 109. 

Matthews, J. R. (1923). The distribution of certain portions of the British 
flora. Annals of Botany, xxxvH, 2 77. 

Miner, J. (1922). Tables of V i — and i — r® for Use in Partial Correla¬ 
tion and Trigonometry. Johns Hopkins Press, Baltimore. 

Morton, W. E. (1926)- The importance of hair weight per centimetre as a 
measurable character of cotton and some indications of its practical 
utility. Jour. Textile Inst., xvii, T537. 

Mumford, A. A., and Young, M. (1923). The interrelationships of the 
physical measurements and the vital capacity. Biometrika, xv, 109. 

Nayer, P. P. N. (1936). An investigation into the application of Neyman 
and Pearson’s test, with tables of percentage limits. Statistical 
Research Memoirs, i, 38. 

Neyman, J, (1934). On the two different aspects of the representative 
method: the method of stratified sampling and the method of purposive 
selection, Jom, Roy. Stat. Soc., xcvii, 558. 

— and Pearson, E. S. (1931). On the problem of k samples. Btdl, de I Acad, 

Polonaise des Set. et des Let., S6rie A, 460. 

-, (1928). On the use and interpretation of certain teat criteria for 

purposes of statistical inference. Biometrika, XXA, 175. 



REFERENCES 


^77 


Orensteen, M* M. (1920). Correlation of cephalic measnrmients in Egyptim 
bom natives. Biometrikay xiii, 17. 

Paekes a. S., and Drummond, J. C. (1925). The effect of vitenin B 
deficiency on reproduction. Proc, Boy, Soc,y B, xcviii, 147. 

Pearl, R. (1907). Variation and Differentiation in CeratopI^IJMm, Came^ 
Institution of Washington. 

Pearse, G. E. (192S). On corrections for the moment-coeffidents of freqiKncy 
distributions. Biometrikay sxa, 314. 

Pearson, E. S. (1926). A further note on the distribution of range in sanq^fcs 
taken from a normal population. Biometrikay xviii, 173. 

— (1930). A further development of tests for normality. Biomeinkay xxi, 239. 

— (1932). The percentage limits for the distribution of range in samples 

from a normal population. Biometrika, xxiv, 404, 

Pearson, K. (1895), Skew variation in homogeneous material Pfdl, Trms» 
Roy, Soc., A, clxsxvi, 343. 

— (1900). On the criterion that a given system of deviations frcmi the 

probable in the case of a correlated system of variable is sudi that it 
can be reasonably supposed to have arisen from random sampling. 
Phil Mag,y Series V, 1 , 157. 

— (1904). On the theory of contingency and its relation to a^odatkm and 

normal correlation. Drapers’ Company Research Memoirs^ i- 

— (1905). On the general theory of skew correlation and non-Hnear rt^r^ 

sion. Drapers’ Company Research Memoirsy ii. 

— (1910). On a new method of determining correlation- BiometrSaiy vE, 24S. 

— (1920). On the probable errors of frequency constants- Btcmietr^ta^ 


xui, 113. 

— (1931) (Editor). Tables for Statistidans and Biometridans. Part I, 3rd 

Edition. Cambridge University Press. 

— and Lee, A. (1903). On the laws of inheritance in man. Bwmetr^Oy m, 357- 
Przyborqwski, J., and Wileij^ski, H. (1935). Statistical prindpte of routine 

Work in testing clover seed for dodder. BwmetrUaiy xxvii, 273- 
Sanders, H. G., and Wishaet, J. (1935)- Prindples and Practice Field 
Experimentation. Empire Cotton Growing Corporation. 

Smith, A. M., and Prentice, E. G, (1929)- Inves%ation on HetefrAm^ 
schachtii in I-»ancashire and Cheshire. Annals of ApplwdBtcAx^yy xvi, 324* 
Snow, E, C. (1911). On the determination of the dtief correlatims 

collaterals in the case of a simple Menddian jKspulation noting at 

random. Proc. Roy, Soc., B, Ixxxiii, 37. 

** Student” (1907). On the error of counting with a hsemacytsoiDQete^ 

Biometrikay v, 351. 

— (1908), The probable error of a mean. BwmetnkOy vi, i. 

— (1925). New tables for testing the significance of ob^rvatkaM. M^rm, 

V, No. 3,18. 

Teppett, L. H. C. (1927). Random sampling manbers. Tracts for CoB^isaers, 

XV. Cambridge University Press. - , j 

— (1934)- Statistical metiiods in textile reseaidi; uses of tte bmom^ and 

Poisson distributions. Shirley Inst, Mem., xiii, 35, ox four. Text, Inst., 
193s, xrvi, T13. 



2^8 METHODS OF STATISTICS 

To'WER, W. L. (1902), Variation in the ray-flowers of Chrysanthemum 

leucantheTTtum. Biometrika, i, 309. ... 

TscHEPOUBKOWSKy. E. (1905)- Contributions to the study of interracial 

correlation. Biometrika, iv, 386. _ m n- 

Warben, E. (1909). Some statistical observations on Termites. Btometrika, 

vi, 329. . , . r T-»* • 

Weldon, W. F. R. (1901). Change in organic correlation of Ftcana ranuncu- 

loides during the flowering season. Biometrika, i, 125* ^ i. 

Wheldale, M. (1907). The inheritance of flower colour in Anttrrhmum 

majus, Proc, Roy. Soc., B, Ixxix, 288, ^ j v 

Winter, F. L. (1929). The mean and variability as affected by continuous 

selection for composition in com. your. Agn. Research, xxxix, 451. 
WiSHART, J. (1928). Table of significant values of the multiple correlation 
coefficient. Quarterly Jour. Roy. Met. Soc., liv, 258. . 

Tates F. (1933^}* The formation of Latin scjuares for use in field experi¬ 
ments. i, 235. t. j 

(^933^)* The principles of orthogonahty and confounding in rephcated 

experiments. Jour. Agrt. Set., xxiii, 108. 

(1933c). The analysis of replicated experiments when the field results 

are incomplete. Empire Jour. Exp. Agn., i, ^ 29 « 

(1936a). Incomplete Latin squares. Jour. Agri, Sd., xxvi, 3 ®^- 

— (19366). A new method of arranging variety trials involving a large 

number of experiments. Jour. Agrt. Sci., xxvi, 424* 

— (1936c). Incomplete randomised blocks. Armais of Eugenics, vii, 121. 
Yule, G. U. (1927). An Introduction to the Theory of Statistics, 7th Edn. 

Griffin. 

ZiNN, J. (1923). Correlations between various characters of wheat and flour 
as determined from published data, etc. Jour. Agn. Research, xxiii, 5 ^ 9 * 



INDEX 


ages of husbands and wives, i6i 
American bladder nut, 126 
analysis of variance, 126, 213 
-, computation, 130, 217, 226, 

232 

-, non-uniform arrays, 132 

-, relation to correlation, 156 

analysis of co-variance, use for in¬ 
creasing precision of experi¬ 
ment, 248 

arbitrary origin and scale, 36 
arithmetic mean, 28 
array, 126 

array means, curve of, 154, 192 
association, 106, 126, 140 
—measurement of, 129,160,190,202 
asymmetry, 34 
attribute, 16 
average, 28 

bacterial counts, 123 
Balls, 20 
Barlow, 18 
Bateson, 103 
Beavan, 235 

34 

—, samplii^ errors, 86 
binomial distribution, 46 

- j moments, 48 

-small samples, 121 

-, standard error of mean, 82 

binomial distribution, use for testing 
randomness, 50 
Bowley, 209 
British Association, 73 
Brownlee, 106 

cabbage seed, 50, 102 
causation, 162 
cephalic index, 178 
Ceratophyllium, 22 
chance, 43, 44 
character, 16 

chess-board arrangement, 229 

98 


X^y additive natuie, im 
^ general dharacter, 108 

i —sampling distributicm, 98,108,271 

—^,use for toting s^nificancM of 
^ correlaricKis, 178 

choice of statistical cocKtants, 92 
Christidis, 228 
chry^nfiMOiurii, 23 
doudin^, 23 
Cochran, 54 

coefiBcient of variatiOTi, 30 

-standard error, 88 

Collins, 114 

complex sample, standard ^rors, 
204 

computation, see wider am - 

stants 

—, accuracy in, 42, 2^ 
concrete, 182 

confidence limits and coefiBdent, 90 
confounding, 241 
consistency, gz 
constant, statistical, 19 
contingency, 105 
—f ooefiSdent, 202 
—, relation to correlation, 203 
—, table, 104 
continuous variate, 16 
control, 15 
ccMitroIs, use of, 234 
^ Co-operative study, 142, 

Corkifl, IIS 
If com yield, 79, 227 
coirektion, 140 

—, analysis of varknce, 156, 192, 
258 

—^ multiple, 257 

—, partH 243 

—, partial, linear, 251 
—, partial, nmi-linear, 2S3 
—, spurious, in time series, 254 
correlation co^Qc^nt, 144 

-combinatkin of estma^to, 177 

- j camjaitarimi, 165 

-i estimation, 164 



28 o methods of statistics 


correlation coefHcient, multiple, 257 

-, multiple, sampling errors, 263 ' 

-partial, 251 

-, partial, proof of formula, 268 

-partial, sampling errors and 

significances, 261 

-, significance of differences, 

176, 179 

-standard error, 172 

-, tests of significance, 173 

-, transformation, 176 

correlation ratio, 189 
correlation table, 140 
cotton hairs, lengths, 70 

-- ^ sampling, 68, 70, 205 

-, spiral reversals, 20 

--, weights, 263 

cotton seed, 122 
co-variance, 246 

—^ use to increase precision of 
e^eriment, 248 
cuckoos'* eggs, 133, 191 
cumulative diagram, 20 
curve fitting, frequency, 50, 56 

-, regression, 149, 195, 256 

cyst counts, 123, 207 

Darbishire, 83 

degrees of freedom, 99, no 

-effect of fitting constants, 99, 

159 

difference between two samples, 
standard error, 72 
—, significance, 98, 108 
discrete variate, 16 
dispersion, 30 
distribution, 19 
Drummond, 83 

economy in sampling, 207 
Eden, 243 
efficiency, 92 
Egyptians, 180 
Elderton, 62, 100 

electric current, effect on growth, 

electric lamps, 30 
elementary schoolboys, 161 


Englishmen, heights, 73, 84, 118 
error curve, 54 
errors, 17 

—, of first and second kinds, 74 
estimation, theory of, 92 
event, 43, 47 

experimental arrangement, 210 
experimental method, 15 
experiments, agricultural, 226 

factorial experiments, 235 
. fathers’ heights, 39, loi 
fiducial limits and probability, 90 
Fisher, 18, 54, 57, 86, 90, 92, 96, 
100, 109, 113, 117, 123, 

147, I59» 162, 173, 176, 

17% 197, 230, 240, 241, 

243, 247, 250, 261 

flatness, of mode, 33 
Flint, 1x4 
flower colour, 102 
Fourier analysis, 50 
fraternal correlations, 177 
frequency constants, 27 
frequency curves, 54, 60 

-non-normal, 61 

-normal, 54 

frequency diagram, 20 
frequency distributions, 19 
-, comparison between experi¬ 
mental and theoretical, 98 
-comparison between experi¬ 
mental, 107 
frequency polygon, 20 
frequency surface, 144 
frequency table, 19 

-, formatiou, 26 

fimctions of statistical constants, 
standard errors, 87 

Gaussian distribution, 54 
geometric mean, 28 
Glanville, 182 
glycogen, xi6, 119 
goodness of fit, 98 
Gould, 2x8 
grain yields, 227 
grain and straw yields, 243 



mDWL 


281 


grouping, 26, 140 - least squaries, 155 

groups of correlation coeffid^ts, "- , proof of linear equatiGiis, i'67' 

178 


groups of samples, significance of 
means, 78, 136 

haemacytometer, 51 

half-drill strip, 235 

Hampton, 218 

Hancock, 20 

Harmon, 148 

harmonic mean, 28 

Harris, J. A., 126 

Harris, R. H., 257 

head length, 148, 180 

heterogeneity of variability, 138 

- y effect on theory of errors, 

204 

- ^ effect on correlation, 243 

histogram, 20 
Hoblyn, 147, 162, 247 
hollow curve, 26 
houses, sampling, 68 

independence, 44 
individuals, 16 
inductive inference, 89 
infinite population, 45, 62, 67 
insulin, 115, 119 
insurance, 17 
interaction, 222, 236 
interpolation, in table of 2, 119 
—, linear, 52, 57 
intrinsic accuracy, 93 
inverse probability, 91 
Irwin, 64, 159 

Jones, 161 

kinetic theory of gases, 27 
Koshal, 70 
kurtosis, 33 

Latin square, 230 

-, computation, 232 

-, formation, 233 

Latter, 134 

law of small numbers, 49 


I^ves per whorl, 22 
Lee, 39 

level of significance, 70 

-and probabffity, distiactictti, 

76, 136 
likelihood, 95 
loaf volume, 192, 257 

Mackenzie, 123 
Mahalanobis, 118, 120 
maize seedlings, 113 
marriages in churches, 162 
mathematical probability, 44 
Matthews, 23 
maximum likelihood, 95 
McLane, 114 
mean, 28 

—, differences ^qsre^^ w var^(», 
128 

—, sampling distribution, 63 
—, significance, 70, 71 
—, —, parallel pairs, 211 
—, —small samples, 112, 114 
—f standard error, 62, 63 
—, standard error in complex popu¬ 
lation, 204 
mean deviation, 30 

-, standard error, 85 

mean square contingency, 20a 
median, 29 
Mendel, 103 
mice, 83 

Miner, 251 ^ 

mode, 26, 29 

moments, ccmputation, 34 
—, deduction from ccmtiniMUS fre¬ 
quency curve, 55 
—, second, 30 
—, third and fourth, 33 
mortar, 

Morton, 141 

multiple factor experinwnt, 235 
Mumford, 251 

Nayer, 120 
Ne3?man, 74 » 90 » 



METHODS OF STATISTICS 


zSz 

non-nonnal frequency curves, 6i 
normal distribution, 54, 270 
-determination of frequencies, 

56 

-moments, 55 

-, practical applicability, 60 

-, relation between frequencies 

and standard deviation, 59 
normal surface, 149 
normality, tests for, 86 

occurrence, 47 
odds, see probability 
ogive, 20 
Orensteen, 180 
orthogonal equations, 197 
orthogonality, 241 
ovules per ovary, 126, 191 

pairs, variance from, in 
parameters, 92 
parent and children, 161 
Paries, 83 

peaiiness, see kurtosis 
Pearl, zz 
Pearse, 23 

Pearson, E. S., 74, 79, 85, 86, 120, 
138 

Pearson, K., 18, 31, 33, 39, 57, 86, 
98, 100, 106, 176, 190 
peas, 103, 121 
Peter’s method, 30 
physical determinatioi^, random 
errors, 69 
pis^, 145, 177 
Poisson distribution, 48 
— —small samples, 123 

- , standard error of mean, 84 

-tables, 49 

- , testing randomness, 51 

polynomial equation, 195 

- j system of fitting, 197 

pc^pulation, 16 

—i determination from sample, 89 
—infinite, 45, 62, 67 
Prentice, 123, 207 
probability, 43 
—, test of laws, 49 


probability integral, 56 
probable error, 71 
product moment, 165 

- , computation, 166 

proportionate frequencies, 28 
protein content, of flour, 257 

-, of wheat, 174, 192, 197, 254 

Przyborowski, 54 

qualitative and quantitative variate, 
16 

quantity of information, 94, 178, 
265 

quartile deviation, 30 
-, relation to standard devia¬ 
tion, 60 

rabbits, 115, 119 
rainfall and sunshine, 161 
random groups or blocks, 230 
random samples, 45, 67 

-complex theory, 204 

-, technique, 67 

random sampling numbers, 68 
randomness, tests of, 49 
range, 31 

—, standard error, 85 
ratio betvreen two means, standard 
error, 88 
reaction time, 148 
recruits, 20 
regression, 149 
—, as variance, 193, 200 
—, line through origin, 265 
—, linear, 149 
—, multiple, 256 
—non-linear, 189 
—partial, 256 

—tests for linearity, 192, 200 
regression coefiScients, 155 

-computation, 165 

-, estimation, 164 

-, significances, 181 

-, standard error, 182 

Reid, 182 

representative samples, 67 
residuals, 130 

—, correlation of, 244, 253 



INDEX 


*83 


routine analysis, determination of 
errors, 206 

sample, 16 
—, random, 45, 67 
—, small, no 
sampling, by strata, 209 
—, unrestricted random medxod, , 
209 

sampling distribution, 62 
-, skew, 80 

- , of various constants, see tmder 

the constants 

sampling from limited field, 209 
sampling technique, 68 
sampling theory, complex popula¬ 
tions, 205 

- y small samples, 110 

Sanders, 240, 250 
scatter, see dispersion 
scatter diagram, 140 
Scotsmen, heights, 73, 84, 118 
semi-interquartile distance, 31 
set, 47 

sex-ratio of rats, 83, 104 
shape of distribution, 33 
Sheppard, 57 

Sheppard’s corrections, 39, 131 

-effect on tests for correlation, 

188 

significance, statistical, 71 
—, from skew distribution, 80 
—, tests of, 69 

—, see also under various constants 
skew sampling distributions, tests of 
significance, 80 
skewness, 21, 34 
—, test for, 86 
small numbers, law of, 49 
smallpox, 106, 203 
small samples, no 
Smith, 123, 207 
smoothing, 201 
Snow, 161, 177 
Soper, 49 

spectacle glass, 218 
spiral reversal, 20 
apurious correlation, 254 


stamens, 145, 177 
standard deviation, 30 

- y estimated fcmi two 

73 

-relatiiHi to nomal f 

59 

-, standard error, 84 

standard error, 62 

-, comply sanqjte, 204 

-, of various constants, see tmhr 

the aomtants 
statistical method, 15 
statistical prolmbility, 44 
statistical significance, 
statistics, 92 
strata, 209 
Student, 52,113 
sub-range, 19 
summation, rules, 37 
sunshine and rainfett, 161 
survivor curve, 20 

t test, 112, 271 
—, essential character, 114 
—, relation to z test, 134 
temperatures, maximum and mini¬ 
mum, 147,247 
termites, 137^ 211 
tern eggs, 142, 157 
Thornton, 123 
time series, 197, 201 

-correlation of, 254 

Tippett, 54, 68, 120 
Tower, 23 

transformed variate, 36 
. trial, 47 
triplet births, 177 
Tschepourkowsky, 178 
Turner, 70 

universe, 16 

vaccination, 106, 203 
variability, 30 
variable, 16 

—y dep^dent and indep^Mient, 154 

variance, 30 
—, additive nature, 125 

I * 



284 METHODS OF STATISTICS 

variance, analysis, 125 . weighting, see quantity of informa- 


—, analysis, computation, 130, 134, 
217, 232 

—, analysis, effect of skewness, 138 
—, analysis into many parts, 213 
—, associated with regression, 156, 
192, 200, 259 
—, computation, 34 
—, estimation from small samples, 
no 

—y sampling errors in small samples, 
117, 120 

—, standard error, 88 
variate, 16 
vice-counties, 23 
vital capacity, 251 
vitamin B, 83, 105 

waltzing mice, 83 
Warren, 137 

weaving of cotton warps, 235 


tion 

Weldon, 145, 161 
Wheldale, 102 
Wilenski, 54 
WiUis, 26 

^ iiy 

Winter, 175 
Wishart, 240, 250, 262 
working mean, 36 

Yates, 18, 234, 241 
yeast cells, 51, 84 
Young, 251 
Yule, 161, 162 

z, interpolation in tables of, 119 
srtest, 117, 272 
—, relation to t test, 134 
z', transformation for correlatior 
coefficient, 176 
Zinn, 192 









